History on the Semantic Web
Event Gazetteers and History Ontologies
Events are an essential component of cultural heritage (CH) Linked Data (LD): they link actors, places, times, objects, and other events into larger narrative structures, providing a rich basis for semantic searching, recommending, analysis, and visualization of CH data.
For example, biographies, histories, photos, and paintings often reference or depict events.
In this research line of SeCo, we develop shared vocabularies (gazetteers, ontologies) of events, such as the “Battle of Normandy” or “Crucifixion of Jesus”, and apply such representation in publishing and utilizing CH content on the Semantic Web.
In our view, event gazetteers, a kind of semantic historical timelines, are a basis for developing and interlinking historical ontologies.
Historical ontologies are necessary to 1) facilitate the aggregation and linking of heterogeneous content from various collections and 2) are very useful in developing "intelligent" applications of cultural heritage, such as semantic portals.
Semantic Timeline and History of Finland
Our work started in 2006 by developing a small timeline in RDF of Finnish history during the 19th century. This system was based on the Chronology of Finnish History developed by the Argicola community of Finnish historians. This system is included in the CultureSampo semantic portal that was published in 2008. Since then the timeline has been extented to cover the whole Argicola Chronology, and a new online demonstrator is under development.
Semantic Timeline and History of World War I
In 2011 we started working on war history, in particular history of World War I, in collaboration with Prof.
Thea Lindquist of
the University of Colorado, Boulder.
A set of general requirements for an event gazetteer is being developed based on the needs of publishing, aggregating, and reusing cultural heritage content as Linked Data, as well as a metadata model addressing the presented requirements for representing historical events. The model is being applied in a case study aimed at developing an event ontology for World War I (WWI) with applications. Our goals from an end-user perspective are twofold: 1) Facilitate event-based cataloging for curators in memory organizations; 2) Utilize semantic event descriptions and narrative event structures in end-user applications for searching and linking documents and other content about WWI, and for structuring and visualizing them.
(Cf. slides presented at the CIDOC 2012 conference).
Our work builds upon the legacy, results (ontologies, ontology services, tools, pilot applications), and
established collaboration network of the FinnONTO project series 2003-2012.
Finnish World War II on the Semantic Web: WarSampo
This project 2015-2017 aims at publishing large heterogeneous sets of Linked Open Data related to the World War II in Finland.
To demonstrate use cases of such data, the WarSampo semantic portal providing different perspectivetive of war
for different use cases is being developed.
See the WarSampo site for more details and demos.
WarSampo is the next member in the "Sampo" series of our Linked Data based systems for
Cultural Heritage and Digital Humanities, the others being
and TravelSampo .
Prof. Eero Hyvönen, Aalto University
Dr. Eetu Mäkelä, Aalto University
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori: BiographySampo – A Paradigm Shift for Publishing and Using Biography Collections on the Semantic Web
. November, 2018. bib pdf
This paper argues for making a paradigm shift in publishing and using biographical dictionaries on the web, based on Linked Data. Firstly, a biographical dictionary on the web should provide the end user with an enhanced reading experience of biographies by enriching them with data linking and reasoning. Secondly, the web publication should include not only biographies for humans to read but also versatile tooling for 1) biographical research of individual persons as well as for 2) prosopographical research on groups of people. To support these arguments, we present the designing principles and the implementation of the semantic portal ”BiographySampo – Finnish Life Stories on the Semantic Web” especially from the end user’s point of view. The system is based on a Linked Data service and knowledge graph extracted automatically from a collection of 13 100 textual biographies, written by 900 researchers. The texts are enriched with data linking to 16 external data sources and by harvesting external collection data from libraries, museums, and archives. The portal, consisting of seven different interlinked application perspectives, was released on September 27, 2018, for free public use for Digital Humanities researchers and the general public.
Jouni Tuominen, Eero Hyvönen and Petri Leskinen: Bio CRM: A Data Model for Representing Biographical Data for Prosopographical Research
. Proceedings of the Second Conference on Biographical Data in a Digital World 2017 (BD2017)
, vol. 2119, pp. 59-66, CEUR Workshop Proceedings, Linz, Austria, 2018. bib pdf link
Biographies make a promising application case of Linked Data: they can be used, e.g., as a basis for Digital Humanities research in prosopography and as a key data and linking resource in semantic Cultural Heritage (CH) portals. In both use cases, a semantic data model for harmonizing and interlinking heterogeneous data from different sources is needed. This paper presents such a data model, Bio CRM, with the following key ideas: 1) The model is a domain specific extension of CIDOC CRM, making it applicable to not only biographical data but to other CH data, too. 2) The model makes a distinction between enduring unary roles of actors, their enduring binary relationships, and perduing events, where the participants can take different roles modeled as a role concept hierarchy. 3) The model can be used as a basis for semantic data validation and enrichment by reasoning. 4) The enriched data conforming to Bio CRM is targeted to be used by SPARQL queries in a flexible ways using a hierarchy of roles in which participants can be involved in events.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori: Biografiasammon tekoäly yhdistää ja rikastaa suomalaiset elämäkerrat semanttisessa webissä
. Aalto-yliopisto, Semanttisen laskennan tutkimusryhmä (SeCo), Nov, 2018. bib pdf link
Biografiasampo-järjestelmä käynnistää uuden aikakauden elämäkertakokoelmien julkaisemisessa ja käyttämisessä verkossa. Järjestelmän ydinaineistona on Kansallisbiografia ja muut Suomalaisen Kirjallisuuden Seuran (SKS) ja tieteellisten seurojen toimittamat pienoiselämäkerrat, yhteensä 13 100 elämäntarinaa, joita on kirjoittanut 900 suomalaista tutkijaa. Biografiasammon innovaationa on luoda kieliteknologian, tekoälyn ja semanttisen webin teknologioiden avulla elämäkertojen teksteistä ja niihin eri lähteissä liittyvistä tiedoista tietämysverkko (knowledge graph) ja kansallinen tietoinfrastruktuuri, joka koostuu miljoonista tietojen välisistä yhteyksistä. Tietämysverkko on julkaistu linkitetyn datan palvelussa, jonka varaan on toteutettu seitsemästä sovellusnäkymästä koostuva älykäs, kaikille avoin ja maksuton verkkopalvelu biografiasampo.fi kansalaisten ja digitaalisten ihmistieteiden tutkijoiden käytettäväksi.
Minna Tamper, Petri Leskinen, Kasper Apajalahti and Eero Hyvönen: Using Biographical Texts as Linked Data for Prosopographical Research and Applications
. Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection. 7th International Conference, EuroMed 2018, Nicosia, Cyprus
, Springer-Verlag, November, 2018. bib pdf
Jouni Tuominen, Eetu Mäkelä, Eero Hyvönen, Arno Bosse, Miranda Lewis and Howard Hotson: Reassembling the Republic of Letters - A Linked Data Approach
. Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018)
, pp. 76-88, CEUR Workshop Proceedings, Helsinki, Finland, March, 2018. bib pdf link
Between 1500 and 1800, a revolution in postal communication allowed ordinary men and women to scatter letters across and beyond Europe. This exchange helped knit together what contemporaries called the respublica litteraria, Republic of Letters, a knowledge-based civil society, crucial to that era’s intellectual breakthroughs, and formative of many modern European values and institutions. To enable effective Digital Humanities research on the epistolary data distributed in different countries and collections, metadata about the letters have been aggregated, harmonised, and provided for the research community through the Early Modern Letters Online (EMLO) service. This paper discusses the idea and benefits of using Linked Data as a basis for the next digital framework of EMLO, and presents experiences of a first demonstrational implementation of such a system.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Jouni Tuominen and Kirsi Keravuori: Semantic National Biography of Finland
. Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018)
, pp. 372-385, CEUR Workshop Proceedings, Vol-2084, Helsinki, Finland, March, 2018. bib pdf link
This paper presents the vision of publishing and utilizing textual biographies as Linked (Open) Data on the Semantic Web. As a case study, we publish the live stories of the National Biography of Finland, created by the Finnish Literature Society, as semantic, i.e., machine “understandable” metadata in a SPARQL endpoint using the Linked Data Finland (LDF.fi) service. On top of the data service various Digital Humanities applications are built. The applications include searching and studying individual personal histories as well as historical research of groups of persons using methods of prosopography. The biographical data is enriched by extracting events from unstructured and semi-structured texts, and by linking entities internally and to external data sources. A faceted semantic search engine is provided for filtering groups of people from the data for prosopographical research. An extension of the event-based CIDOC CRM ontology is used as the underlying data model, where lives are seen as chains of interlinked events populated from the data of the biographies and additional data sources, such as museum collections, library databases, and archives.
Eetu Mäkelä, Juha Törnroos, Thea Lindquist and Eero Hyvönen: WW1LOD: An application of CIDOC-CRM to World War 1 linked data
. International Journal on Digital Libraries, vol. 18, no. 4, pp. 333-343, Springer, nov, 2017. bib pdf link
The CIDOC-CRM standard indicates that common events, actors, places and timeframes are important in linking together cultural material, and provides a framework for describing them. However, merely describing entities in this way in two datasets does not yet interlink them. To do that, the identities of instances still need to be either reconciled, or be based on a shared vocabulary. The WW1LOD dataset presented in this paper was created to facilitate both of these approaches for collections dealing with the First World War. For this purpose, the dataset includes events, places, agents, times, keywords, and themes related to the war, based on over ten different authoritative data sources from providers such as the Imperial War Museum. The content is harmonized into RDF, and published as a Linked Open Data service. While generally basing on CIDOC-CRM, some modeling choices used also deviate from it where our experience dictated such. In the article, these deviations are discussed in the hope that they may serve as examples where CIDOC-CRM itself may warrant further examination. As a demonstration of use, the dataset and online service have been used to create a contextual reader application that is able link together and pull in information related to WW1 from e.g. 1914–1918 Online, Wikipedia, WW1 Discovery, Europeana and the Digital Public Library of America.
Esko Ikkala, Eetu Mäkelä and Eero Hyvönen: TourRDF: Representing, Enriching, and Publishing Curated Tours Based on Linked Data
. 19th International Conference of Knowledge Engineering and Management (EKAW 2014), Demo and Poster Papers
, November, 2014. bib pdf
Current mobile tourist guide systems are developed and used in separate data silos: each system and vendor tends to use its own proprietary, closed formats for representing tours and point of interest (POI) content. As a result, tour data cannot be enriched from other providers’ tour and POI repositories, or from other external data sources — even when such data were publicly available by, e.g., cities willing to promote tourism. This paper argues, that an open shared RDF-based tour vocabulary is needed to address these problems, and introduces such a model, TourRDF, extending the earlier TourML schema into the era of Linked Data. As a test and an evaluation of the approach, a case study based on data about the Unesco World Heritage site Suomenlinna fortress is presented.
Eero Hyvönen, Miika Alonen, Esko Ikkala and Eetu Mäkelä: Life Stories as Event-based Linked Data: Case Semantic National Biography
. Proceedings of ISWC 2014 Posters & Demonstrations Track
, CEUR Workshop Proceedings, October, 2014. bib pdf link
This paper argues, by presenting a case study and a demonstration on the web, that biographies make a promising application case of Linked Data: the reading experience can be enhanced by enriching the biographies with additional life time events, by proving the user with a spatio-temporal context for reading, and by linking the text to additional contents in related datasets.
Thea Lindquist, Michael Dulock, Juha Törnroos, Eero Hyvönen and Eetu Mäkelä: Using Linked Open Data to Enhance Subject Access in Online Primary Sources
. Cataloging & Classifying Quarterly, vol. 51, no. 8, Francis & Taylor, 2013. bib link
Using online primary sources is both rewarding and challenging for users. Improving subject access is essential as these sources become increasingly important in educational curricula. A user needs assessment with humanities users showed improving findability and context for historical subjects were major needs. Linked Data can help by linking related concepts in the sources using specialized vocabularies, enriching them with outside resources, and enabling semantic services that empower users. This article discusses a project to enhance subject access in an online World War I collection by deep linking historical data on the civilian experience in occupied Belgium and France.
Thea Lindquist, Eero Hyvönen, Juha Törnroos, Eetu Mäkelä: Leveraging linked data to enhance subject access - A case study of the University of Colorado Boulder s World War I collection online
. World Library and Information Congress: 78th IFLA General Conference and Assembly, Helsinki
, IFLA, http://conference.ifla.org/ifla78, August, 2012. bib link
Academic users often find work with online primary sources both rewarding and challenging. Improving subject access in these sources is essential as digital collections propagate and work with primary sources becomes increasingly important in humanities curricula. A user needs assessment was conducted with humanities users at the University of Colorado Boulder to facilitate engagement with these sources. Two of the major user needs identified were improving findability and context, particularly for historical subjects. Linked Data can help meet these needs by linking related concepts in the sources using a specialized vocabulary, enriching them with outside resources, and enabling semantically rich services that empower users. This paper discusses a project the authors undertook to enhance subject access in CU’s WWI Collection Online by deep linking historical data on the civilian experience in occupied Belgium. This work is intended to lead to a richer understanding of forces shaping the WWI period.
Eero Hyvönen, Thea Lindquist, Juha Törnroos and Eetu Mäkelä: History on the Semantic Web as Linked Data - An Event Gazetteer and Timeline for World War I
. Proceedings of CIDOC 2012 - Enriching Cultural Heritage, Helsinki, Finland
, CIDOC, http://www.cidoc2012.fi/en/cidoc2012/programme, June, 2012. bib pdf
Events are an essential component of cultural heritage (CH) Linked Data (LD): they link actors, places, times, objects, and other events into larger narrative structures, providing a rich basis for semantic searching, recommending, analysis, and visualization of CH data. This paper argues that shared vocabularies (gazetteers, ontologies) of events, such as the “Battle of Normandy” or “Crucifixion of Jesus”, are necessary to facilitate the aggregation and linking of heterogeneous content from various collections. For example, biographies, histories, photos, and paintings often reference or depict events. A set of general requirements for an event gazetteer is presented, based on the needs of publishing, aggregating, and reusing cultural heritage content as Linked Data. After this, a metadata model addressing the presented requirements for representing historical events is outlined. The model is being applied in a case study aimed at developing an event ontology for World War I (WWI). Our goals from an end-user perspective are twofold: 1) Facilitate event-based cataloging for curators in memory organizations; 2) Utilize semantic event descriptions and narrative event structures in end-user applications for searching and linking documents and other content about WWI, and for structuring and visualizing them.
Eeva Ahonen and Eero Hyvönen: Publishing Historical Texts on the Semantic Web - A Case Study
. Proceedings of the Third IEEE International Conference on Semantic Computing (ICSC2009)
, Berkeley, CA, USA, September, 2009. bib pdf
Historical texts are an important component of cultural heritage, and are being digitized and published on the web in various portals for the researhers and the public. However, searching and linking them with related contents is challenging due the non-structured text form, digitization errors, and the differences and variations between old and modern language, including historical names (e.g. places), used for querying. This paper addresses these issues by presenting an approach and a system for publishing old texts on the semantic web. As a case study, an existing historical newspaper archive on the web is considered. In our model, semantic metadata is added to the text using automated concept extraction methods. Search is implemented with semantic techniques, by creating a multi-faceted search interface for the text materials. Problems due to OCR errors and spelling variants are addressed with a fuzzy string matching algorithm trying to guess corresponding words in a lexicon, and giving suggestions for corrected words forms. References between texts in the library as well as links between the library and external knowledge sources are formed by using shared ontologies for semantic annotations.
Eero Hyvönen, Olli Alm and Heini Kuittinen: Using an Ontology of Historical Events in Semantic Portals for Cultural Heritage
. Proceedings of the Cultural Heritage on the Semantic Web Workshop at the 6th International Semantic Web Conference (ISWC 2007)
, Busan, Korea, November 12, 2007. bib pdf
We argue that an ontology of historical events is needed in semantic portals for cultural heritage due to three reasons. First, ontological identifiers (URIs) of events, such as the World War II or coronation of Napoleon, are needed in order to make collection metadata mutually interoperable in terms of related events---in the vein as identifiers are needed for identifying artifact types, persons, and geolocations when annotating collection items. Second, events are of central importance in creating semantic links between cultural contents in applications such as recommendation systems. Third, historical events are important as content items of their own, forming the backbone of chronological histories.