Linked Open Data Infrastructure for Digital Humanities in Finland
Digital Humanities (DH) is a major new research paradigm at the crossroads of computing, humanities, and social sciences.
The main idea is to develop and use novel computational methods, such as data analysis, topic modeling, visualization,
network analysis, deep learning, and artificial intelligence, to solve research problems in Social Sciences and Humanities (SSH)
based on big data that is becoming available as a result of digitalization of the society.
DH matches well with the multidisciplinary strategy of Aalto, and there are indeed DH research activities in five Aalto schools
(Science; Arts, Design and Architechture; Engineering; Electrical Engineering; Business). At the University of Helsinki (UH),
the Helsinki Centre for Digital Humanities HELDIG was recently established as a major strategic profiling
action of the university with eight new HELDIG professors nominated in six faculties, and a 10MEUR budget for 2016-2020.
There are substantial DH actives also in other Finnish universities, such as the University of Turku and Univerisity of Tampere, and in major Cultural Heritage (CH)
and media organizations, such as the National Library, National Archives, Finnish Heritage Agency (Museovirasto), Finnish Literature Society (SKS),
National Broadcasting Company Yle, and many others.
In EU, DH research infrastructure work is coordinated by the EU ERIC DARIAH -- Digital Research Infrastructure for the Arts and Humanities
Aalto and UH have joined into DARIAH as co-operative partners.
LODI4DH is a joint initiative of Aalto University, Department of Computer Science, and University of Helsinki, HELDIG Centre for Digital Humanities,
for creating centralized national Linked Data services for open science.
The services enable publication and utilization of datasets for data-intensive DH research in structured,
standardized formats via open interfaces.
LODI4DH is based on the large collaboration network and software created during a long line of national projects in DH between UH and Aalto
since 2002 that created several in-use infrastructure prototypes, such as the ONKI ontology service,
Finto ontology service at the National Library of Finland (that deployed SKOS-based parts of ONKI as a national service, and has been developing them further),
and Linked Data Finland platform LDF.fi.
This line of research started with the national FinnONTO project series (2003-2012)
on creating a national ontology infrastructure in Finland,
and has continued with, e.g., the projects Linked Data Finland (2012-2014)
and Linked Open Data Science Service by the SeCo group at Aalto University and University of Helsinki.
This short video presents the vision and work behind LODI4DH (at 83rd Annual Meeting of the Association for Information Science and Technology, USA):
ONKI/Finto and LDF.fi already have had a wide user base demonstrating the need for the LODI4DH infrastructure.
Applications based on them have also made their way from academic research into real use. For example, the Sampo series of semantic portals have had millions of users on the Web. Many museums in Finland, e.g., Espoo City Museum, AKSELI Consortium of 8 museums,
and the new national KOOKOS cataloging system make use of the ONKI/Finto ontologies.
In addition to the Finnish projects, there are several collaborative research projects with international universities,
such as Oxford, Stanford, Colorado, and Pennsylvania, where the Finnish Linked Data services for DH have been used.
LODI4DH focuses on DH research infrastructures but the underlying Linked Data and Semantic Web technology can and has
been utilized in other fields of research, too, extending substantially the utilization potential of the infrastructure.
LODI4DH aims at harnessessing all this work into sustainable national services, and integrating the work as a component
into the EU ERIC DARIAH infrastructure. LODI4DH infrastructure is open source, publishes open data, and is free of charge for everyone to use.
National Roadmap of Research Infrastructures
In December 2020, the national Digital Humanities infrastructure proposal "Common Language Resources and Technology Infrastructure (FIN-CLARIAH)" coordinated by the HELDIG Centre was accepted as an initiative in the research
infrastructure roadmap of the Academy of Finland, including several Finnish universities and cultural heritage organizations. LODI4DH is included in this initiative.
Domain Ontologies for Data Linking
Data from collaborating organizations is aggregated into shared open shared domain ontologies for
1) historical places and maps, 2) historical persons, 3) events, 4) keyword concepts, and 5) times.
These core ontologies, provided as web services, are used as “semantic glue” in data linking and fusion.
Historical Places and Maps
As for historical places and maps, our work aims at developing the Finnish Ontology Service of Historical Places and Maps (Hipla), cf. the demonstrator Hipla.fi.
This work started already in FinnONTO, and has been revitalized in the context of building the National Semantic Biography of Finland and
related other biographical systems, see
Semantic Biographies Based on Linked Data.
This line of research in LODI4DH builds upon our work on History on the Semantic Web, with applications such as
WarSampo -- Finnish WW2 on the Semantic Web.
Historical Keyword Concepts
When developing ONKI and Finto, lots of Finnish keyword thesauri were converted and developed further into RDFS and SKOS ontologies,
interlinked into a global linked data cloud called the KOKO, and published as ontology services.
However, more work is needed here, for example, is areas such as archeology, built enviroments, history, and law.
LODI4DH creates a time ontology for making references to historical times and periods of time, including names of time periods.
Services for calendar date conversion will be included in the system. Here results from international projects can be utilized.
Harmonizing Metadata Models
The project works on developing harmonizing metadata models for representing semantic data,
such as Bio CRM for extending
CIDOC CRM to representing biographical data.
We also work on publishing and sharing interlinked core datasets, that are deemed to be useful in different research projects and applications.
These dataset are expected to evolve into a kind of Finnish Linked Open Data Cloud. Work has started on, e.g., the following datasets:
Linked Open Name Archive, based on data about 2.7 million place names provided by the Institute for the Languages in Finland (Kotus);
Semantic National Biography, based on over 13 000 biographies of prominent Finns edited by the Finnish Literature Society (SKS);
WarSampo datatsets related to WW2 history, provided by the National Archives of Finland, Defence Forces, and others;
University of Helsinki Person Registry (1640-2000), provided by the University of Helsinki Archives;
Semantic Finlex legislation and case law data, provided by the Ministry of Justice.
Linked Data Services
As for the publishing platform, the Linked Data Finland platform is used and developed further with additional services for DH
data production, publishing, data analysis, and visualization.
We also produce educational online materials, developing, e.g., the Linked Data School LinDa,
for using Linked Data technology in DH research and application development.
Prof. Eero Hyvönen, Aalto University and University of Helsinki (HELDIG)
Dr. Jouni Tuominen, University of Helsinki (HELDIG) and Aalto University
An overview of the project is presented in these slides.
Eero Hyvönen: How to Create a National Cross-domain Ontology and Linked Data Infrastructure and Use It on the Semantic Web
. Oct, 2021. Keynote presentation for the DCMI 2021 conference. bib pdf
The vision behind the Semantic Wed is to build a global Web of Data (Giant Global Graph, GGG) for machines to use: based on this an interoperable and intelligent transnational WWW for humans can be created cost-efficiently. This keynote presentation for the DCMI 2001 conference addresses this grand challenge on a national level, as in practice much of the data available are often related to each other within national cultures, borders, organizations, and are represented using national languages, metadata models, vocabularies, and local conventions. This presentation overviews and discusses the vision and lessons learned in Finland on developing and deploying a cross-domain national ontology service infrastructure and Linked Open Data (LOD) publishing framework, extending the classic 5-star model to a 7-star model for better data re-usability (6. star) and quality (7. star). To test and demonstrate the infrastructure, a series of semantic portals and LOD services have been created using the Sampo model that has evolved gradually in 2002--2021 through lessons learned when developing and publishing the Sampo series of systems, including MuseumFinland (2004), HealthFinland (2009), CultureSampo (2009), BookSampo (2011), WarSampo (2015), BiographySampo (2018), NameSampo (2019), WarWictimSampo (2019), Mapping Manuscript Migrations (2020), AcademySampo (2021), as well as FindSampo, Law\-Sampo, and ParliamentSampo underway. These systems cover a wide range of application domains and have attracted up to millions of users on the Semantic Web depending on the application, suggesting feasibility of the proposed model. This work shows a shift of focus in research on semantic portals from data aggregation and exploration systems (1. generation systems) to systems supporting research with data analytic tools (2. generation systems), and finally to automatic knowledge discovery and Artificial Intelligence (3. generation systems).
Mikko Koho, Esko Ikkala, Petri Leskinen, Minna Tamper, Jouni Tuominen and Eero Hyvönen: WarSampo Knowledge Graph: Finland in the Second World War as Linked Open Data
. Semantic Web – Interoperability, Usability, Applicability, vol. 12, no. 2, pp. 265-278, January, 2021. bib pdf link
The Second World War (WW2) is arguably the most devastating catastrophe of human history, a topic of great interest to not only researchers but the general public. However, data about the Second World War is heterogeneous and distributed in various organizations and countries making it hard to utilize. In order to create aggregated global views of the war, a shared ontology and data infrastructure is needed to harmonize information in various data silos. This makes it possible to share data between publishers and application developers, to support data analysis in Digital Humanities research, and to develop data-driven intelligent applications. As a first step towards these goals, this article presents the WarSampo knowledge graph (KG), a shared semantic infrastructure, and a Linked Open Data (LOD) service for publishing data about WW2, with a focus on Finnish military history. The shared semantic infrastructure is based on the idea of representing war as a spatio-temporal sequence of events that soldiers, military units, and other actors participate in. The used metadata schema is an extension of CIDOC CRM, supplemented by various military historical domain ontologies. With an infrastructure containing shared ontologies, maintaining the interlinked data brings upon new challenges, as one change in an ontology can propagate across several datasets that use it. To support sustainability, a repeatable automatic data transformation and linking pipeline has been created for rebuilding the whole WarSampo KG from the individual source datasets. The WarSampo KG is hosted on a data service based on W3C Semantic Web standards and best practices, including content negotiation, SPARQL API, download, automatic documentation, and other services supporting the reuse of the data. The WarSampo KG, a part of the international LOD Cloud and totalling ca. 14 million triples, is in use in nine end-user application views of the WarSampo portal, which has had over 400 000 end users since its opening in 2015.
Mikko Koho, Petri Leskinen and Eero Hyvönen: Integrating Historical Person Registers as Linked Open Data in the WarSampo Knowledge Graph
. Semantic Systems. In the Era of Knowledge Graphs. SEMANTiCS 2020
(Eva Blomqvist, Paul Groth, Victor de Boer, Tassilo Pellegrini, Mehwish Alam, Tobias Käfer, Peter Kieseberg, Sabrina Kirrane, Albert Meroño-Peñuela and Harshvardhan J. Pandit (eds.)), Lecture Notes in Computer Science, vol. 12378, pp. 118-126, Springer, Cham, Amsterdam, The Netherlands, October, 2020. bib pdf link
Semantic data integration from heterogeneous, distributed data silos enables Digital Humanities research and application development employing a larger, mutually enriched and interlinked knowledge graph. However, data integration is challenging, involving aligning the data models and reconciling the concepts and named entities, such as persons and places. This paper presents a record linkage process to reconcile person references in different military historical person registers with structured metadata. The information about persons is aggregated into a single knowledge graph. The process was applied to reconcile three person registers of the popular semantic portal WarSampo -- Finnish World War 2 on the Semantic Web . The registers contain detailed information about some 100,000 people and are individually maintained by domain experts. Thus, the integration process needs to be automatic and adaptable to changes in the registers. An evaluation of the record linkage results is promising and provides some insight into military person register reconciliation in general.
Chris A. Sula, Kalani Craig, Michelle Dalmu, Alex Humphreys, Eero Hyvönen, Hannah L. Jacobs, Humphrey Keah, Joseph Kiplangat, Thea Lindquist, Nicholas Weber, Scott B. Weingart: Infrastructures of Digital Humanities
. 83rd Association for Information Science and Technology (ASIS&T) Annual Meeting, proceedings
, Association for Information Science and Technology, Silver Spring, Maryland, USA, October, 2020. bib pdf link
Mikko Koho, Erkki Heino, Petri Leskinen, Esko Ikkala, Minna Tamper, Kasper Apajalahti, Jouni Tuominen, Eetu Mäkelä and Eero Hyvönen: WarSampo Knowledge Graph
. Zenodo, October, 2019. Dataset. bib link
WarSampo Knowledge Graph includes harmonized data of different kinds concerning the Second World War in Finland, separated in different subgraphs representing events, actors, places, photographs, and other aspects and documentation of the war. The data covers the Winter War 1939-1940 against the Soviet attack, the Continuation War 1941-1944 where the occupied areas of the Winter War were temporarily regained, and the Lapland War 1944-1945, where the Finns pushed the German troops away from Lapland.
Mikko Koho, Esko Ikkala and Eero Hyvönen: Reassembling the Lives of Finnish Prisoners of the Second World War on the Semantic Web
. Proceedings of the Third Conference on Biographical Data in a Digital World (BD 2019)
, Varna, Bulgaria, September, 2019. bib pdf
This paper presents first results of a new, ninth application perspective for the semantic portal WarSampo - Finnish WW2 on the Semantic Web, based on a database of ca. 4450 Finnish prisoners of war in the Soviet Union. Our key idea is to reassemble the life of each prisoner of war by using Linked Data, based on information about the person in different data sources. Using the enriched aggregated data, a biographical global home page for each prisoner of war can be created, that is more complete than information in individual data sources. The application perspective is targeted to researchers of military history, to study and analyze the data in order to form new research questions or hypotheses, as well as to public in the large looking for information e.g., about their relatives that were captured as prisoners of war. Employing the faceted search of the application perspective, prosopographical research on subgroups of prisoners is possible.
Lia Gasbarra, Mikko Koho, Ilkka Jokipii, Heikki Rantala and Eero Hyvönen: An Ontology of Finnish Historical Occupations
. The Semantic Web: ESWC 2019 Satellite Events
(Hitzler, Pascal, Kirrane, Sabrina, Hartig, Olaf, de Boer, Victor, Vidal, Maria-Esther, Maleshkova, Maria, Schlobach, Stefan, Hammar, Karl, Lasierra, Nelia, Stadtmüller, Steffen, Hose, Katja and Verborgh, Ruben (eds.)), Lecture Notes in Computer Science, pp. 64-68, Springer, Cham, Portoroz, Slovenia, June, 2019. bib pdf link
Historical datasets often impose the need to study groups of people based on occupation or social status. This paper presents first results in creating an ontology of historical Finnish occupations, AMMO, that enables selection of groups of people based on their occupation, occupational groups, or socioeconomic class. AMMO is linked to the international historical occupation classification HISCO and to a modern Finnish occupational classification for interoperability. AMMO will be used as a component in two semantic portals for Finnish war history.
Mikko Koho, Lia Gasbarra, Jouni Tuominen, Heikki Rantala, Ilkka Jokipii and Eero Hyvönen: AMMO Ontology of Finnish Historical Occupations
. Proceedings of the First International Workshop on Open Data and Ontologies for Cultural Heritage (ODOCH 19)
, vol. 2375, pp. 91-96, CEUR Workshop Proceedings, Rome, Italy, June, 2019. Vol 2375. bib pdf link
This paper introduces AMMO Ontology of Finnish Historical Occupations. AMMO is based on thousands of occupation labels extracted from three Finnish military historical datasets of the early 20th century: the first consists of the ca. 40 000 war-related death records around the time of the Finnish Civil War (1914–1922); the second consists of the ca. 95 000 death records of Finnish soldiers in the Winter War and Continuation War (1939–1944); the third contains the ca. 4500 records of Finnish prisoners of war in the Soviet Union during the WW2. Our goal from a Digital Humanities perspective is to use AMMO to study military history and these datasets based on the occupation and social status of the soldiers. AMMO will also be used as a component for faceted search and semantic recommendation in two semantic portals for Finnish military history. AMMO is aligned with the international historical occupation classification HISCO and with a modern Finnish occupational classification for international and national interoperability. The ontology is published as Linked Open Data in an ontology service.
Mikko Koho, Esko Ikkala, Erkki Heino and Eero Hyvönen: Maintaining a Linked Data Cloud and Data Service for Second World War History
. Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection. 7th International Conference, EuroMed 2018, Nicosia, Cyprus
, vol. 11196, Springer-Verlag, October-November, 2018. bib pdf link
Mikko Koho, Erkki Heino, Esko Ikkala, Eero Hyvönen, Reijo Nikkilä, Tiia Moilanen, Katri Miettinen and Pertti Suominen: Integrating Prisoners of War Dataset into the WarSampo Linked Data Infrastructure
. Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018)
, CEUR Workshop Proceedings, Helsinki, Finland, March, 2018. Vol 2084. bib pdf link
One of the great promises of Linked Data and the Semantic Web standards is to provide a shared data infrastructure into which more and more data can be imported and aligned, forming a sustainable, ever growing knowledge graph or linked data cloud, Web of Data. This paper studies and evaluates this idea in the context of the WarSampo Linked Data cloud, providing an infrastructure for data related to the Second World War in Finland. As a case study, a new database of prisoners of war with related contents is transformed into linked data and integrated into WarSampo. Lessons learned are discussed in relation to using traditional data publishing approaches.