phone: +358 50 431 6071 (office)
room: B126 @ Department of Computer Science, Computer Science bulding, Konemiehentie 2, Espoo
postal address: Department of Computer Science,
P.O. Box 15400, FI-00076 Aalto, Finland
Currently working in the Severi project.
Mikko Koho, Erkki Heino, Petri Leskinen, Minna Tamper, Esko Ikkala, Eetu Mäkelä, Jouni Tuominen and Eero Hyvönen: Linked Open Data and Ontology Infrastructure for Second World War History
. Submitted. bib pdf
Data about the Second World War (WW2) is heterogeneous and distributed in different organizations and countries. This paper argues that in order to create aggregated global views of the war, a shared semantic infrastructure is needed, including data models of the real world events, metadata schemas for presenting their documentation, a data harmonization model for data aggregation, and shared domain ontologies for populating the schemas in an interoperable way. As a solution, a Linked Open Data service is presented for publishing data about Finland in WW2. The service is based on W3C Semantic Web standards and best practices, including content negotiation, SPARQL API, download, automatic documentation, and other services supporting re-use of the data. The ontologies and data in the service, totalling ca. 9 million triples, is in use in seven end-user applications of the WarSampo portal, that have had tens of thousands of end-users.
Arttu Oksanen, Jouni Tuominen, Eetu Mäkelä, Minna Tamper, Aki Hietanen and Eero Hyvönen: Law and Justice as a Linked Open Data Service
. Submitted. bib pdf
Everybody is expected to know and obey the law in today’s society. Governments therefore publish legislation and case law widely in print and on the web. Such legal information is provided for human consumption, but the information is usually not available as data for algorithmic analysis and applications to use. However, this would be beneficial in many use cases, such as building more intelligent juridical online services and conducting research into legislation and legal practice. To address these needs, this paper presents Semantic Finlex, a national in-use data resource and system for publishing Finnish legislation and related case law as a Linked Open Data service with applications. The system transforms and interlinks on a regular basis data from the legacy legal database Finlex of the Ministry of Justice into Linked Open Data, based on the new European standards ECLI and ELI. The data is hosted on a ”7-star” SPARQL endpoint with a variety of related services available that ease data re-use. Rich Internet Applications using only SPARQL for data access are presented as first application demonstrators of the data service.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Jouni Tuominen and Kirsi Keravuori: Semantic National Biography of Finland
. November, 2017. Submitted for review. bib pdf
This paper presents the idea and project of transforming and using the textual biographies of the National Biography of Finland, published by the Finnish Literature Society, as Linked (Open) Data. The idea is to publish the lives as semantic, i.e., machine “understandable” metadata in a SPARQL endpoint in the Linked Data Finland (LDF.fi) service, on top of which various Digital Humanities applications are built. The applications include searching and studying individual personal histories as well as historical research of hgroups of persons using methods of prosopography. The basic biographical data is enriched by extracting events from unstructured texts and by linking entities internally and to external data sources. A faceted semantic search engine is provided for filtering groups of people from the data for research in Digital Humanities. An extension of the event-based CIDOC CRM ontology is used as the underlying data model, where lives are seen as chains of interlinked events populated from the data of the biographies and additional sources, such as museum collections, library databases, and archives.
Petri Leskinen, Mikko Koho, Erkki Heino, Minna Tamper, Esko Ikkala, Jouni Tuominen, Eetu Mäkelä and Eero Hyvönen: Modeling and Using an Actor Ontology of Second World War Military Units and Personnel
. Proceedings of the 16th International Semantic Web Conference (ISWC 2017)
(Claudia d Amato, Miriam Fernandez, Valentina Tamma, Freddy Lecue, Philippe Cudré-Mauroux, Juan Sequeda, Christoph Lange and Jeff Heflin (eds.)), pp. 280-296, Springer-Verlag, Vienna, Austria, October, 2017. bib pdf link
This paper presents a model for representing historical military personnel and army units, based on large datasets about World War II in Finland. The model is in use in WarSampo data service and semantic portal, which has had tens of thousands of distinct visitors. A key challenge is how to represent ontological changes, since the ranks and units of military personnel, as well as the names and structures of army units change rapidly in wars. This leads to serious problems in both search as well as data linking due to ambiguity and homonymy of names. In our solution, actors are represented in terms of the events they participated in, which facilitates disambiguation of personnel and units in different spatio-temporal contexts. The linked data in the WarSampo Linked Open Data cloud and service has ca. 9 million triples, including actor datasets of ca. 100 000 soldiers and ca. 16 100 army units. To test the model in practice, an application for semantic search and recommending based on data linking was created, where the spatio-temporal life stories of individual soldiers can be reassembled dynamically by linking data from different datasets. An evaluation is presented showing promising results in terms of linking precision.
Eero Hyvönen, Erkki Heino, Petri Leskinen, Esko Ikkala, Mikko Koho, Minna Tamper, Jouni Tuominen and Eetu Mäkelä: WarSampo: Publishing and Using Linked Open Data about the Second World War
. EuropeanaTech Insight, no. 7, Europeana, September, 2017. bib pdf link
The article overviews the system WarSampo – Finnish World War 2 on the Semantic Web, the winner of the LODLAM Challenge 2017 Open Data Prize on June 29 in Venice, Italy.
Minna Tamper, Petri Leskinen, Esko Ikkala, Arttu Oksanen, Eetu Mäkelä, Erkki Heino, Jouni Tuominen, Mikko Koho and Eero Hyvönen: AATOS – a Configurable Tool for Automatic Annotation
. Proceedings, Language, Technology and Knowledge (LDK 2017)
, pp. 276-289, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper presents an automatic annotation tool AATOS for providing documents with semantic annotations. The tool links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different natural language Finnish texts. The application was developed as a part of WarSampo and Semantic Finlex projects and tested using Kansa Taisteli magazine articles and consolidated Finnish legislation of Semantic Finlex. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results.
Erkki Heino, Minna Tamper, Eetu Mäkelä, Petri Leskinen, Esko Ikkala, Jouni Tuominen, Mikko Koho and Eero Hyvönen: Named Entity Linking in a Complex Domain: Case Second World War History
. Proceedings, Language, Technology and Knowledge (LDK 2017)
, pp. 120-133, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper discusses the challenges of applying named entity linking in a rich, complex domain – specifically, the linking of 1) military units, 2) places and 3) people in the context of rich Second World War data. Multiple sub-scenarios are discussed in detail through concrete evaluations, analyzing the problems faced, and the solutions developed. A key contribution of this work is to highlight the heterogeneity of problems and approaches needed even inside a single domain, depending on both the source data as well as the target authority.
Minna Tamper: Extraction of Entities and Concepts from Finnish Texts
. MSc Thesis (in English), Aalto University, School of Science, Degree Programme in Computer Science and Engineering, Dec, 2016. bib pdf
Keywords are used in many document databases to improve search. The process of assigning keywords from controlled vocabularies to a document is called subject indexing. If the controlled vocabulary used for indexing is an ontology, with semantic relations and descriptions of concepts, the process is also called semantic annotation. In this thesis an automatic annotation tool was created to provide the documents with semantic annotations. The application links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different Finnish texts. The application was developed as a part of WarSampo and Semantic Finlex projects and tested using Kansa Taisteli magazine articles and consolidated legislation of Finnish legislation. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results.
Eero Hyvönen, Erkki Heino, Petri Leskinen, Esko Ikkala, Mikko Koho, Minna Tamper, Jouni Tuominen and Eetu Mäkelä: Publishing Second World War History as Linked Data Events on the Semantic Web
. Proceedings of Digital Humanities 2016, short papers
, pp. 571-573, Kraków, Poland, July, 2016. bib pdf link
Data about wars is typically heterogeneous, distributed in the data silos of the fighting parties, multilingual, and often controversial depending on the political point of view. It is therefore hard for the historians to get a global picture of what has actually happened, to whom, where, when, and how. We argue that Semantic Web and Linked Data technologies are a very promising approach for modeling, harmonizing, and aggregating data about war history. Our goal is to make it possible, for both historians and laymen, to study history in a contextualized way where linked datasets enrich each other. The paper presents the in-use WarSampo 1 system, where massive collections of heterogeneous data about the (Finnish) history of the Second World War are harmonized using an event-based approach, and provided as a Linked Open Data service for applications to use. As a use case, a semantic portal WarSampo providing six different perspectives to the war based on events is presented.
Eero Hyvönen, Erkki Heino, Petri Leskinen, Esko Ikkala, Mikko Koho, Minna Tamper, Jouni Tuominen and Eetu Mäkelä: WarSampo Data Service and Semantic Portal for Publishing Linked Open Data about the Second World War History
. The Semantic Web – Latest Advances and New Domains (ESWC 2016)
(Harald Sack, Eva Blomqvist, Mathieu d Aquin, Chiara Ghidini, Simone Paolo Ponzetto and Christoph Lange (eds.)), pp. 758-773, Springer-Verlag, May, 2016. bib pdf
This paper presents the WarSampo system for publishing collections of heterogeneous, distributed data about the Second World War on the Semantic Web. WarSampo is based on harmonizing massive datasets using event-based modeling, which makes it possible to enrich datasets semantically with each others’ contents. WarSampo has two components: First, a Linked Open Data (LOD) service WarSampo Data for Digital Humanities (DH) research and for creating applications related to war history. Second, a semanticWarSampo Portal has been created to test and demonstrate the usability of the data service. The WarSampo Portal allows both historians and laymen to study war history and destinies of their family members in the war from different interlinked perspectives. Published in November 2015, theWarSampo Portal had some 20,000 distinct visitors during the first three days, showing that the public has a great interest in this kind of applications.
(total: 11 publications)