» print this page!
» Follow us on Twitter
» Be our friend on Facebook

Latest News

Latest Publications

SeCo on Twitter

SeCo on Facebook

Semantic Web Publications - Texts as Data Services (Severi)

Project Goals

The project develops automatic annotation technology and tools by which texts can be transformed into Linked Data services. The methods are tested and evaluated in practise by developing application demonstrators on top of the data services in four case study areas:

  1. Legal texts in the context of the Semantic Finlex project
  2. Norms in use in the constraction industry
  3. Business news about law and technology innovations
  4. Publishing biographical materials on the Semantic Web
  5. Semantic media tracking in news, funded separately by VTS foundation
  6. Improving findability in web marketing using Schema.org, funded separately by VTS foundation

Research Plan

The project lasts Sept 1, 2016 - May 31st, 2018. More detailed materials about the project and its results will be published on this home page later. An abstract in Finnish is available below:

WWW on muuttumassa perinteisestä dokumenttien julkaisualustasta (Web of Documents) datan julkaisualustaksi (Web of Data). Ideana on media-aineistojen julkaiseminen verkossa ihmisluettavan tekstin ohella myös koneluettavana datana, mikä mahdollistaa sovellusten kehittämisen ja lisäarvon luomisen uudenlaisina palvelukon-septeina ja liiketoimintamalleina. Teknologisena haasteena on kuitenkin tekstiaineistojen rakenteistaminen dataksi, missä tarvitaan kieliteknologian ja semanttisen web-teknologian monitieteistä yhdistämistä.

Severi-hankkeessa luodaan avoin teknologinen perusta ja yhteistyöverkosto tekstiperustaisten verkkosisältöjen julkaisemiseksi semanttisina datapalveluina. Tutkimustyö tehdään hankkeessa mukana olevan yrityskonsortion tapaustutkimusten kautta sovellusalueina juridiset aineistot, rakennusalan normit, uutiset sekä e-kirjat. Hankkeen tulokset julkaistaan verkkopalveluina ja avoimella lisenssillä niiden maksimaaliseksi hyödyntämiseksi Suomessa. Hankkeessa on mukana myös laaja kansainvälinen huippuyliopistojen yhteistyöverkosto.

Consortium

The project consortium includes the following organizations:

  1. Aalto University, Department of Computer Science
  2. Edita Publishing Ltd
  3. CSC Ltd
  4. Heldig - Helsinki Centre for Digital Humanities
  5. Lingsoft Ltd
  6. Ministry of Justice
  7. Building Information Group Ltd
  8. Finnish Literature Society (SKS)
  9. Svenska Littetursällskapet i Finland (SLS)
  10. Tekniikan akateemiset TEK
  11. YLE Ltd

Thanks to Tekes for making the project financially possible.

The project Steering Group includes the following representatives: Sari Korhonen (Edita), Pirjo-Leena Forsström (CSC), Tiina Lindh-Knuutila and Juhani Reiman (Lingsoft) Aki Hietanen (Ministry of Justice), Jouko Kanerva (Building Information Group), Kirsi Keravuori (SKS), Karola Söderman (SLS), Pekka Pellinen (TEK), Pia Virtanen (YLE), and Eero Hyvönen (Aalto). Aki Parviainen is the project representative at Tekes.

Contact Person

Prof. Eero Hyvönen, Director , Aalto University and University of Helsinki, Heldig


Publications

year: other

Arttu Oksanen, Jouni Tuominen, Eetu Mäkelä, Minna Tamper, Aki Hietanen and Eero Hyvönen: Law and Justice as a Linked Open Data Service. Submitted. bib pdf
Everybody is expected to know and obey the law in today’s society. Governments therefore publish legislation and case law widely in print and on the web. Such legal information is provided for human consumption, but the information is usually not available as data for algorithmic analysis and applications to use. However, this would be beneficial in many use cases, such as building more intelligent juridical online services and conducting research into legislation and legal practice. To address these needs, this paper presents Semantic Finlex, a national in-use data resource and system for publishing Finnish legislation and related case law as a Linked Open Data service with applications. The system transforms and interlinks on a regular basis data from the legacy legal database Finlex of the Ministry of Justice into Linked Open Data, based on the new European standards ECLI and ELI. The data is hosted on a ”7-star” SPARQL endpoint with a variety of related services available that ease data re-use. Rich Internet Applications using only SPARQL for data access are presented as first application demonstrators of the data service.

2017

Petri Leskinen, Eero Hyvönen and Jouni Tuominen: Analyzing and Visualizing Prosopographical Linked Data Based on Short Biographies. Biographical Data in a Digital World 2017 (BD2017), Linz, Austria, November, 2017. bib pdf link
Jouni Tuominen, Eero Hyvönen and Petri Leskinen: Bio CRM: A Data Model for Representing Biographical Data for Prosopographical Research. Biographical Data in a Digital World 2017 (BD2017), Linz, Austria, November, 2017. bib pdf link
Petri Leskinen, Jouni Tuominen, Erkki Heino and Eero Hyvönen: An Ontology and Data Infrastructure for Publishing and Using Biographical Linked Data. Proceedings of the Workshop on Humanities in the Semantic Web (WHiSe II), CEUR Workshop Proceedings, Vienna, Austria, October, 2017. bib pdf
This paper describes the ontology model and published datasets of a digitized biographical person register. The applied ontology model is designed to represent people via their enduring roles and perduring lifetime events. The model is designed to support 1) prosopographical Digital Humanities research, 2) linking to resources in semantic Cultural Heritage portals, and 3) semantic data validation and enrichment by using SPARQL queries. The linked data approach enables to enrich a person s biography by interlinking it with space and time related biographical events, persons relating by social contacts or family relations, historical events, and personal achievements.
Erkki Heino, Minna Tamper, Eetu Mäkelä, Petri Leskinen, Esko Ikkala, Jouni Tuominen, Mikko Koho and Eero Hyvönen: Named Entity Linking in a Complex Domain: Case Second World War History. Proceedings, Language, Technology and Knowledge (LDK 2017), pp. 120-133, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper discusses the challenges of applying named entity linking in a rich, complex domain – specifically, the linking of 1) military units, 2) places and 3) people in the context of rich Second World War data. Multiple sub-scenarios are discussed in detail through concrete evaluations, analyzing the problems faced, and the solutions developed. A key contribution of this work is to highlight the heterogeneity of problems and approaches needed even inside a single domain, depending on both the source data as well as the target authority.
Eero Hyvönen, Petri Leskinen, Erkki Heino, Jouni Tuominen and Laura Sirola: Reassembling and Enriching the Life Stories in Printed Biographical Registers: Norssi High School Alumni on the Semantic Web. Proceedings, Language, Technology and Knowledge (LDK 2017), pp. 113-119, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper presents the idea to enrich printed biographical person registers with linked data related to events that took place after the register was published. By transforming printed historical documents into structured data, semantic search to written texts can be provided for the reader. Even more importantly, life stories of historical persons can be extended based on data linking by extracting semantic structures from printed texts, and by combining this data with external datasets and data services. Such linking provides an enriched context for prosopographical research on people in the register, as well as an enhanced reading experience for anyone interested in reading the biographies. As a concrete case study, a register 1867–1992 of over 10 000 alumni of the prominent Finnish high school “Norssi” was transformed into RDF, was enriched by data linking, was published as a linked data service, and is provided to end users via a faceted search engine and browser for studying lives of historical persons and for prosopographical research.
Minna Tamper, Petri Leskinen, Esko Ikkala, Arttu Oksanen, Eetu Mäkelä, Erkki Heino, Jouni Tuominen, Mikko Koho and Eero Hyvönen: AATOS – a Configurable Tool for Automatic Annotation. Proceedings, Language, Technology and Knowledge (LDK 2017), pp. 276-289, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper presents an automatic annotation tool AATOS for providing documents with semantic annotations. The tool links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different natural language Finnish texts. The application was developed as a part of WarSampo and Semantic Finlex projects and tested using Kansa Taisteli magazine articles and consolidated Finnish legislation of Semantic Finlex. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results.
Eero Hyvönen, Arttu Oksanen, Jouni Tuominen, Eetu Mäkelä and Minna Tamper: Semanttinen Finlex. Laki ja oikeus avoimena linkitettynä datana. (Semantic Finlex. Law and Justice as Linked Open Data.). Oikeus-lehti, vol. 46, no. 1, March, 2017. bib pdf
/srv/www/seco.cs.aalto.fi/include/secoweb/utils.php; Thu, 23 Nov 2017 02:04:20 +0200