phone: +358 50 431 6071 (office)
room: B126 @ Department of Computer Science, Computer Science bulding, Konemiehentie 2, Espoo
postal address: Department of Computer Science,
P.O. Box 15400, FI-00076 Aalto, Finland
Currently working in the Severi project.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori: BiographySampo – A Paradigm Shift for Publishing and Using Biography Collections on the Semantic Web
. November, 2018. bib pdf
This paper argues for making a paradigm shift in publishing and using biographical dictionaries on the web, based on Linked Data. Firstly, a biographical dictionary on the web should provide the end user with an enhanced reading experience of biographies by enriching them with data linking and reasoning. Secondly, the web publication should include not only biographies for humans to read but also versatile tooling for 1) biographical research of individual persons as well as for 2) prosopographical research on groups of people. To support these arguments, we present the designing principles and the implementation of the semantic portal ”BiographySampo – Finnish Life Stories on the Semantic Web” especially from the end user’s point of view. The system is based on a Linked Data service and knowledge graph extracted automatically from a collection of 13 100 textual biographies, written by 900 researchers. The texts are enriched with data linking to 16 external data sources and by harvesting external collection data from libraries, museums, and archives. The portal, consisting of seven different interlinked application perspectives, was released on September 27, 2018, for free public use for Digital Humanities researchers and the general public.
Arttu Oksanen, Jouni Tuominen, Eetu Mäkelä, Minna Tamper, Aki Hietanen and Eero Hyvönen: Law and Justice as a Linked Open Data Service
. 2017. Submitted. bib pdf
Everybody is expected to know and obey the law in today’s society. Governments therefore publish legislation and case law widely in print and on the web. Such legal information is provided for human consumption, but the information is usually not available as data for algorithmic analysis and applications to use. However, this would be beneficial in many use cases, such as building more intelligent juridical online services and conducting research into legislation and legal practice. To address these needs, this paper presents Semantic Finlex, a national in-use data resource and system for publishing Finnish legislation and related case law as a Linked Open Data service with applications. The system transforms and interlinks on a regular basis data from the legacy legal database Finlex of the Ministry of Justice into Linked Open Data, based on the new European standards ECLI and ELI. The data is hosted on a ”7-star” SPARQL endpoint with a variety of related services available that ease data re-use. Rich Internet Applications using only SPARQL for data access are presented as first application demonstrators of the data service.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori: Biografiasammon tekoäly yhdistää ja rikastaa suomalaiset elämäkerrat semanttisessa webissä
. Aalto-yliopisto, Semanttisen laskennan tutkimusryhmä (SeCo), Nov, 2018. bib pdf
Biografiasampo-järjestelmä käynnistää uuden aikakauden elämäkertakokoelmien julkaisemisessa ja käyttämisessä verkossa. Järjestelmän ydinaineistona on Kansallisbiografia ja muut Suomalaisen Kirjallisuuden Seuran (SKS) ja tieteellisten seurojen toimittamat pienoiselämäkerrat, yhteensä 13 100 elämäntarinaa, joita on kirjoittanut 900 suomalaista tutkijaa. Biografiasammon innovaationa on luoda kieliteknologian, tekoälyn ja semanttisen webin teknologioiden avulla elämäkertojen teksteistä ja niihin eri lähteissä liittyvistä tiedoista tietämysverkko (knowledge graph) ja kansallinen tietoinfrastruktuuri, joka koostuu miljoonista tietojen välisistä yhteyksistä. Tietämysverkko on julkaistu linkitetyn datan palvelussa, jonka varaan on toteutettu seitsemästä sovellusnäkymästä koostuva älykäs, kaikille avoin ja maksuton verkkopalvelu biografiasampo.fi kansalaisten ja digitaalisten ihmistieteiden tutkijoiden käytettäväksi.
Minna Tamper, Petri Leskinen, Kasper Apajalahti and Eero Hyvönen: Using Biographical Texts as Linked Data for Prosopographical Research and Applications
. Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection. 7th International Conference, EuroMed 2018, Nicosia, Cyprus
, Springer-Verlag, November, 2018. bib pdf
Arttu Oksanen, Jouni Tuominen, Eetu Mäkelä, Minna Tamper, Aki Hietanen, and Eero Hyvönen: Semantic Finlex: Finnish Legislation and Case Law as a Linked Open Data Service
. Proceedings of Law via the Internet 2018 (LVI 2018), Knowledge of the Law in the Big Data Age, abstracts
, Florence, Italy, October, 2018. bib pdf
Eero Hyvönen, Petri Leskinen, Minna Tamper, Jouni Tuominen and Kirsi Keravuori: Semantic National Biography of Finland
. Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018)
, pp. 372-385, CEUR Workshop Proceedings, Vol-2084, Helsinki, Finland, March, 2018. bib pdf link
This paper presents the vision of publishing and utilizing textual biographies as Linked (Open) Data on the Semantic Web. As a case study, we publish the live stories of the National Biography of Finland, created by the Finnish Literature Society, as semantic, i.e., machine “understandable” metadata in a SPARQL endpoint using the Linked Data Finland (LDF.fi) service. On top of the data service various Digital Humanities applications are built. The applications include searching and studying individual personal histories as well as historical research of groups of persons using methods of prosopography. The biographical data is enriched by extracting events from unstructured and semi-structured texts, and by linking entities internally and to external data sources. A faceted semantic search engine is provided for filtering groups of people from the data for prosopographical research. An extension of the event-based CIDOC CRM ontology is used as the underlying data model, where lives are seen as chains of interlinked events populated from the data of the biographies and additional data sources, such as museum collections, library databases, and archives.
Petri Leskinen, Mikko Koho, Erkki Heino, Minna Tamper, Esko Ikkala, Jouni Tuominen, Eetu Mäkelä and Eero Hyvönen: Modeling and Using an Actor Ontology of Second World War Military Units and Personnel
. Proceedings of the 16th International Semantic Web Conference (ISWC 2017)
(Claudia d Amato, Miriam Fernandez, Valentina Tamma, Freddy Lecue, Philippe Cudré-Mauroux, Juan Sequeda, Christoph Lange and Jeff Heflin (eds.)), pp. 280-296, Springer-Verlag, Vienna, Austria, October, 2017. bib pdf link
This paper presents a model for representing historical military personnel and army units, based on large datasets about World War II in Finland. The model is in use in WarSampo data service and semantic portal, which has had tens of thousands of distinct visitors. A key challenge is how to represent ontological changes, since the ranks and units of military personnel, as well as the names and structures of army units change rapidly in wars. This leads to serious problems in both search as well as data linking due to ambiguity and homonymy of names. In our solution, actors are represented in terms of the events they participated in, which facilitates disambiguation of personnel and units in different spatio-temporal contexts. The linked data in the WarSampo Linked Open Data cloud and service has ca. 9 million triples, including actor datasets of ca. 100 000 soldiers and ca. 16 100 army units. To test the model in practice, an application for semantic search and recommending based on data linking was created, where the spatio-temporal life stories of individual soldiers can be reassembled dynamically by linking data from different datasets. An evaluation is presented showing promising results in terms of linking precision.
Eero Hyvönen, Erkki Heino, Petri Leskinen, Esko Ikkala, Mikko Koho, Minna Tamper, Jouni Tuominen and Eetu Mäkelä: WarSampo: Publishing and Using Linked Open Data about the Second World War
. EuropeanaTech Insight, no. 7, Europeana, September, 2017. bib pdf link
The article overviews the system WarSampo – Finnish World War 2 on the Semantic Web, the winner of the LODLAM Challenge 2017 Open Data Prize on June 29 in Venice, Italy.
Erkki Heino, Minna Tamper, Eetu Mäkelä, Petri Leskinen, Esko Ikkala, Jouni Tuominen, Mikko Koho and Eero Hyvönen: Named Entity Linking in a Complex Domain: Case Second World War History
. Proceedings, Language, Technology and Knowledge (LDK 2017)
, pp. 120-133, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper discusses the challenges of applying named entity linking in a rich, complex domain – specifically, the linking of 1) military units, 2) places and 3) people in the context of rich Second World War data. Multiple sub-scenarios are discussed in detail through concrete evaluations, analyzing the problems faced, and the solutions developed. A key contribution of this work is to highlight the heterogeneity of problems and approaches needed even inside a single domain, depending on both the source data as well as the target authority.
Minna Tamper, Petri Leskinen, Esko Ikkala, Arttu Oksanen, Eetu Mäkelä, Erkki Heino, Jouni Tuominen, Mikko Koho and Eero Hyvönen: AATOS – a Configurable Tool for Automatic Annotation
. Proceedings, Language, Technology and Knowledge (LDK 2017)
, pp. 276-289, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper presents an automatic annotation tool AATOS for providing documents with semantic annotations. The tool links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different natural language Finnish texts. The application was developed as a part of WarSampo and Semantic Finlex projects and tested using Kansa Taisteli magazine articles and consolidated Finnish legislation of Semantic Finlex. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results.
Minna Tamper: Extraction of Entities and Concepts from Finnish Texts
. MSc Thesis (in English), Aalto University, School of Science, Degree Programme in Computer Science and Engineering, Dec, 2016. bib pdf
Keywords are used in many document databases to improve search. The process of assigning keywords from controlled vocabularies to a document is called subject indexing. If the controlled vocabulary used for indexing is an ontology, with semantic relations and descriptions of concepts, the process is also called semantic annotation. In this thesis an automatic annotation tool was created to provide the documents with semantic annotations. The application links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different Finnish texts. The application was developed as a part of WarSampo and Semantic Finlex projects and tested using Kansa Taisteli magazine articles and consolidated legislation of Finnish legislation. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results.
Eero Hyvönen, Erkki Heino, Petri Leskinen, Esko Ikkala, Mikko Koho, Minna Tamper, Jouni Tuominen and Eetu Mäkelä: Publishing Second World War History as Linked Data Events on the Semantic Web
. Proceedings of Digital Humanities 2016, short papers
, pp. 571-573, Kraków, Poland, July, 2016. bib pdf link
Data about wars is typically heterogeneous, distributed in the data silos of the fighting parties, multilingual, and often controversial depending on the political point of view. It is therefore hard for the historians to get a global picture of what has actually happened, to whom, where, when, and how. We argue that Semantic Web and Linked Data technologies are a very promising approach for modeling, harmonizing, and aggregating data about war history. Our goal is to make it possible, for both historians and laymen, to study history in a contextualized way where linked datasets enrich each other. The paper presents the in-use WarSampo 1 system, where massive collections of heterogeneous data about the (Finnish) history of the Second World War are harmonized using an event-based approach, and provided as a Linked Open Data service for applications to use. As a use case, a semantic portal WarSampo providing six different perspectives to the war based on events is presented.
Eero Hyvönen, Erkki Heino, Petri Leskinen, Esko Ikkala, Mikko Koho, Minna Tamper, Jouni Tuominen and Eetu Mäkelä: WarSampo Data Service and Semantic Portal for Publishing Linked Open Data about the Second World War History
. The Semantic Web – Latest Advances and New Domains (ESWC 2016)
(Harald Sack, Eva Blomqvist, Mathieu d Aquin, Chiara Ghidini, Simone Paolo Ponzetto and Christoph Lange (eds.)), pp. 758-773, Springer-Verlag, May, 2016. bib pdf
This paper presents the WarSampo system for publishing collections of heterogeneous, distributed data about the Second World War on the Semantic Web. WarSampo is based on harmonizing massive datasets using event-based modeling, which makes it possible to enrich datasets semantically with each others’ contents. WarSampo has two components: First, a Linked Open Data (LOD) service WarSampo Data for Digital Humanities (DH) research and for creating applications related to war history. Second, a semanticWarSampo Portal has been created to test and demonstrate the usability of the data service. The WarSampo Portal allows both historians and laymen to study war history and destinies of their family members in the war from different interlinked perspectives. Published in November 2015, theWarSampo Portal had some 20,000 distinct visitors during the first three days, showing that the public has a great interest in this kind of applications.
(total: 18 publications)