Semantic Biographies Based on Linked Data
Research Goals
We aim at developing methods and infrastructure for representing biographical narratives using Linked Data,
as well as practical demonstrators in which usefulness of the new technology is tested and shown.
The general idea is to provide the reader with a richer reading experience by providing her texts with additional linked contextual
information, such as the space and time of biographical events, and links to related persons, historical events, publications, paintings etc.
Biographical data can also be used as a basis for Digital Humanities research, where e.g. prosopographies and networks of people are studied.
BiographySampo - Finnish National Biographies on the Semantic Web
As a first experiment, the Semantic National Biography based on 6300 short biographies of the National Biography of Finland
of the Finnish Literature Society was created with an online
demo in 2014 and a research paper (cf. publications below).
This work continued to cover a larger biography set of 13 100 life stories and extended to developing
tools and visualizations for biography and prosopography research, leading to the publication of the open semantic portal
BiographySampo - Finnish National Biographies on the Semantic Web that soon gathered thousands of end users. (Cf. publications below.)
BiographySampo.fi service aims to make a paradigm shift in publishing biographies on the web.
The system is based on a Linked Data service on top of which several
search and browsing applications as well as tooling for data analysis, network analysis, and
visualizations are provided for biographical research on individual persons as well as for prosopographical research on groups of people.
The data was created by extracting knowledge from the
underlying bioraphical texts, over 13 000 short biographies published by the Finnish Literature Society,
using language technology, and by eriching the data by linking it to various external
biographical databases, Wikipedia/Wikidata, collection databases of memory organizations, semantic web data services etc.
BiographySampo is an importasnt step in our series of semantic "Sampo" portals based on Linked Data, including the earlier systems
CultureSampo (2009),
BookSampo (2011),
and TravelSampo (2011), and
and WarSampo (2015).
BiographySampo was published on Sept 27, 2018.
These applications have been used by hundreds of thousands of people on the Semantic Web.
The vision and major functionalities of BiographySampo are presented in the short video below:
More information is available on the
BiographySampo homepage.
Reassembling the Republic of Letters
Research on biographical data is also of concern in the Reassembling the Republic of Letters project
where SeCo leads the People and Networks Work Package, as well as in the Cultures of Knowledge project where we collaborate with University of Oxford and Stanford University regarding research on epistolary data of people active during the early modern period.
These projects were/are funded by Tekes and the Linked Data Finland consortium, and by the EU COST Office and Mellon Foundation.
Bio CRM: A Data Model for Representing Biographical Information for Prosopography
One of the results of the project is Bio CRM, a conceptual reference model for representation of biographical information, with focus on prosopographical research. Bio CRM provides a semantic data model for harmonizing and interlinking heterogeneous data from different biographical data sources. The model includes structures for basic data of people, personal relations, professions, and events with participants in different roles.
The core design principles of the data model are:
- The model is a domain specific extension of CIDOC CRM, making it applicable to not only biographical data but to other CH data, too.
- The model makes a distinction between enduring unary roles of actors, their enduring binary relationships, and perduing events, where the participants can take different roles modeled as a role concept hierarchy.
- The model can be used as a basis for semantic data validation and enrichment by reasoning.
- The enriched data conforming to Bio CRM is targeted to be used by SPARQL queries in a flexible ways using a hierarchy of roles in which participants can be involved in events.
See more information about Bio CRM in the the working paper and the BD2017 paper.
Bio CRM schema: http://ldf.fi/schema/bioc/.
Reassembling the War History of Military Personnel and Units of WW2
A key goal of the project WarSampo - Finnish WW2 on the Semantic Web
is to reasseble the life stories and narratives of WW2 soldiers and army units based on Linked Open Data from differents data sources.
For this, event-based modeling and an extension of CIDOC CRM is used. (Cf. publications below.)
Publishing Printed Person Registries on the Semantic Web
In this project, a printed person registry (short biographies) of 10 000 alumni
of the prominent Finnish high school Norssi in 1867-1992 was digitized and transformed into a
semantic portal, enriching the data from external data sources, and providing the end user with
faceted search and visualization tools for prosopographical research. (Cf. publications below.)
U.S. Congress Prosopographer
In this project, a database of 12 000 biographical records of U.S. Congress legislators (1789-2018) was transformed and enriched
into a Linked Data service on top of which
tools for biography and prosopography were created. (Cf. publications below.)
Contact Persons
Prof. Eero Hyvönen, Aalto University and University of Helsinki (HELDIG)
MSc Petri Leskinen, Aalto University
Dr. Jouni Tuominen, University of Helsinki (HELDIG) and Aalto University
Publications
2024
2023
Joonas Kesäniemi, Matthias Schlögl, Jouni Tuominen, Victor de Boer and Go Sugimoto:
Towards Reusable Aggregated Biographical Research Data: Provenance and Versioning in the InTaVia Knowledge Graph.
Digital Humanities in the Nordic and Baltic Countries Seventh Conference (DHNB 2023), Book of Abstracts (Sofie Gilbert and Annika Rockenberger (eds.)), pp. 117, University of Oslo Library, Oslo, Norway, March, 2023.
bib link
Matthias Schlögl, Joonas Kesäniemi, Jouni Tuominen, Victor de Boer, Go Sugimoto and Carla Ebel:
Dos and Don’ts of Building a Pan-European Biographical Knowledge Graph: Statistical Analysis of the InTaVia-Platform.
Digital Humanities in the Nordic and Baltic Countries Seventh Conference (DHNB 2023), Book of Abstracts (Sofie Gilbert and Annika Rockenberger (eds.)), pp. 106, University of Oslo Library, Oslo, Norway, March, 2023.
bib link
Minna Tamper, Petri Leskinen, Eero Hyvönen, Risto Valjus and Kirsi Keravuori:
Analyzing Biography Collection Historiographically as Linked Data: Case National Biography of Finland. Semantic Web – Interoperability, Usability, Applicability, vol. 14, no. 2, pp. 385-419, IOS Press, 2023.
bib pdf link
2022
Bernardo S. Buarque, Aline Deicke, Malte Doehne, Martin Düring, Heiner Fangerau, Catherine Herfeld, Charles van den Heuvel, Eero Hyvönen, Roberto Lalli, Malte Vogl, Lea Weiß, Dirk Wintergrün:
White Paper of the ModelSEN Workshop (April 2022). October, 2022.
bib link
Angel Daza, Antske Fokkens, Richard Hadden, Eero Hyvönen, Mikko Koho and Eveline Wandl-Vogt:
Biographical Data in a Digital World 2022 (BD 2022) Workshop.
Digital Humanities 2022, Conference Abstracts, July 25-29, 2022 Online, Tokyo. Japan, University of Tokyo, pp. 39-42, ADHO, July, 2022.
bib link
Mikko Koho, Esko Ikkala and Eero Hyvönen:
Reassembling the Lives of Finnish Prisoners of the Second World War on the Semantic Web.
Proceedings of the Third Conference on Biographical Data in a Digital World (BD 2019), pp. 31-39, CEUR Workshop Proceedings, June, 2022.
bib pdf link This paper presents first results of a new, ninth application perspective for the semantic portal WarSampo - Finnish WW2 on the Semantic Web, based on a database of ca. 4450 Finnish prisoners of war in the Soviet Union. Our key idea is to reassemble the life of each prisoner of war by using Linked Data, based on information about the person in different data sources. Using the enriched aggregated data, a biographical global home page for each prisoner of war can be created, that is more complete than information in individual data sources. The application perspective is targeted to researchers of military history, to study and analyze the data in order to form new research questions or hypotheses, as well as to public in the large looking for information e.g., about their relatives that were captured as prisoners of war. Employing the faceted search of the application perspective, prosopographical research on subgroups of prisoners is possible.
2021
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori:
Biografiasampo yhdistää ja rikastaa suomalaiset elämäkerrat linkitettynä datana semanttisessa webissä (Biographysampo links and enriches Finnish biographies as linked data on the Semantic Web. Informaatiotutkimus, vol. 40, no. 3, pp. 346-368, November, 2021.
bib pdf link
Minna Tamper, Eero Hyvönen and Petri Leskinen:
Visualizing and Analyzing Networks of Named Entities in Biographical Dictionaries for Digital Humanities Research.
Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICling 2019), Springer-Verlag, October, 2021. Forth-coming.
bib pdf This paper shows how named entity extraction and networkanalysis can be used to examine biographies individually and in groupsto aid historians in biographical and prosopographical research. For this purpose a reference network of 13 100 biographies in the collections ofthe Biographical Centre of the Finnish Literature Society was created, based on links between the biographies as well as automatically extracted named entities found in the texts. The data was published in a SPARQL endpoint as a Linked Data knowledge graph on top of which network analytic tools were created and analysis were done showing the usefulness of the approach in Digital Humanities. The reference graph has been utilized for network analysis to examine egocentric networks of individual persons as well as networks among groups of people in prosopography. The data and tools presented are in use since autumn 2018 in the semantic portal BiographySampo that has had tens of thousands of users.
2020
Mikko Koho, Petri Leskinen and Eero Hyvönen:
Integrating Historical Person Registers as Linked Open Data in the WarSampo Knowledge Graph.
Semantic Systems. In the Era of Knowledge Graphs. SEMANTiCS 2020 (Eva Blomqvist, Paul Groth, Victor de Boer, Tassilo Pellegrini, Mehwish Alam, Tobias Käfer, Peter Kieseberg, Sabrina Kirrane, Albert Meroño-Peñuela and Harshvardhan J. Pandit (eds.)), Lecture Notes in Computer Science, vol. 12378, pp. 118-126, Springer, Cham, Amsterdam, The Netherlands, October, 2020.
bib pdf link Semantic data integration from heterogeneous, distributed data silos enables Digital Humanities research and application development employing a larger, mutually enriched and interlinked knowledge graph. However, data integration is challenging, involving aligning the data models and reconciling the concepts and named entities, such as persons and places. This paper presents a record linkage process to reconcile person references in different military historical person registers with structured metadata. The information about persons is aggregated into a single knowledge graph. The process was applied to reconcile three person registers of the popular semantic portal WarSampo -- Finnish World War 2 on the Semantic Web . The registers contain detailed information about some 100,000 people and are individually maintained by domain experts. Thus, the integration process needs to be automatic and adaptable to changes in the registers. An evaluation of the record linkage results is promising and provides some insight into military person register reconciliation in general.
2019
Howard Hotson, Thomas Wallnig, Jouni Tuominen, Eetu Mäkelä, and Eero Hyvönen:
People.
Reassembling the Republic of Letters in the Digital Age (H. Hotson and T. Wallnig (eds.)), pp. 119-136, Göttingen University Press, 2019.
bib link
Mikko Koho, Lia Gasbarra, Jouni Tuominen, Heikki Rantala, Ilkka Jokipii and Eero Hyvönen:
AMMO Ontology of Finnish Historical Occupations.
Proceedings of the First International Workshop on Open Data and Ontologies for Cultural Heritage (ODOCH 19) (Antonella Poggi (ed.)), vol. 2375, pp. 91-96, CEUR Workshop Proceedings, Rome, Italy, June, 2019.
bib pdf link This paper introduces AMMO Ontology of Finnish Historical Occupations. AMMO is based on thousands of occupation labels extracted from three Finnish military historical datasets of the early 20th century: the first consists of the ca. 40 000 war-related death records around the time of the Finnish Civil War (1914–1922); the second consists of the ca. 95 000 death records of Finnish soldiers in the Winter War and Continuation War (1939–1944); the third contains the ca. 4500 records of Finnish prisoners of war in the Soviet Union during the WW2. Our goal from a Digital Humanities perspective is to use AMMO to study military history and these datasets based on the occupation and social status of the soldiers. AMMO will also be used as a component for faceted search and semantic recommendation in two semantic portals for Finnish military history. AMMO is aligned with the international historical occupation classification HISCO and with a modern Finnish occupational classification for international and national interoperability. The ontology is published as Linked Open Data in an ontology service.
Lia Gasbarra, Mikko Koho, Ilkka Jokipii, Heikki Rantala and Eero Hyvönen:
An Ontology of Finnish Historical Occupations.
The Semantic Web: ESWC 2019 Satellite Events (Hitzler, Pascal, Kirrane, Sabrina, Hartig, Olaf, de Boer, Victor, Vidal, Maria-Esther, Maleshkova, Maria, Schlobach, Stefan, Hammar, Karl, Lasierra, Nelia, Stadtmüller, Steffen, Hose, Katja and Verborgh, Ruben (eds.)), Lecture Notes in Computer Science, pp. 64-68, Springer, Cham, Portoroz, Slovenia, June, 2019.
bib pdf link Historical datasets often impose the need to study groups of people based on occupation or social status. This paper presents first results in creating an ontology of historical Finnish occupations, AMMO, that enables selection of groups of people based on their occupation, occupational groups, or socioeconomic class. AMMO is linked to the international historical occupation classification HISCO and to a modern Finnish occupational classification for interoperability. AMMO will be used as a component in two semantic portals for Finnish war history.
Petri Leskinen and Eero Hyvönen:
Extracting Genealogical Networks of Linked Data from Biographical Texts.
The Semantic Web: ESWC 2019 Satellite Events (Hitzler, P., Kirrane, S., Hartig, O., de Boer, V., Vidal, M.-E., Maleshkova, M., Schlobach, S., Hammar, K., Lasierra, N., Stadtmüller, S., Hose, K., Verborgh, R. (ed.)), pp. 121-125, Springer, June, 2019.
bib pdf
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori:
BiographySampo - Publishing and Enriching Biographies on the Semantic Web for Digital Humanities Research.
The Semantic Web. ESWC 2019 (Pascal Hitzler, Miriam Fernández, Krzysztof Janowicz, Amrapali Zaveri, Alasdair J.G. Gray, Vanessa Lopez, Armin Haller and Karl Hammar (eds.)), pp. 574-589, Springer-Verlag, June, 2019.
bib pdf link
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori:
Demonstrating BiographySampo in Solving Digital Humanities Research Problems in Biography and Prosopography.
The Fourth Digital Humanities in the Nordic Countries 2019 (DHN2019), Book of Abstracts, University of Copenhagen, Copenhagen, Denmark, March, 2019.
bib pdf link
Heikki Rantala:
Yhteyshaku semanttisessa webissä. MSc Thesis (in Finnish), University of Helsinki, Department of Computer Science, January, 2019.
bib pdf Tavallisessa haussa etsitään yksilöitä, kuten henkilöitä tai paikkoja. Joissain tilanteissa esimerkiksi historian tutkija voi olla kiinnostunut myös etsimään yhteyksiä henkilöiden ja paikkojen välillä. Tässä työssä esitetään metodi tällaisen yhteyshaun toteuttamiseksi käyttäen semanttisen webin sisältämää avointa dataa. Työssä muodostettiin graafi, joka sisältää kuvauksia Suomen kulttuu- rihistorian henkilöiden ja paikkojen välisistä kiinnostaviksi arvioiduista yhteyksistä. Graafi luotiin SPARQL CONSTRUCT -kyselyillä. Yhteyksien hakemista varten luotiin web-sovellus, joka hyödyntää fasettihakua. Tarvittavien SPARQL CONSTRUCT -kyselyjen luominen ei osoittautunut erityisen hankalaksi, mutta niiden soveltaminen yleisemmin eri aineistoihin vaatii jonkin verran työtä. Yhteyksien fasettihaku osoittautui mielenkiintoiseksi. Fasettihaku mahdollistaa haun tarkentamisen askel kerrallaan. Lisäksi yhteyksien suhteellisia määriä on mahdollista vertailla erilaisten rajausten mukaan. Tämä tarjoaa aineistoon uusia näkökulmia.
2018
Jouni Tuominen, Eero Hyvönen and Petri Leskinen:
Bio CRM: A Data Model for Representing Biographical Data for Prosopographical Research.
Proceedings of the Second Conference on Biographical Data in a Digital World 2017 (BD2017), vol. 2119, pp. 59-66, CEUR Workshop Proceedings, Linz, Austria, 2018.
bib pdf link Biographies make a promising application case of Linked Data: they can be used, e.g., as a basis for Digital Humanities research in prosopography and as a key data and linking resource in semantic Cultural Heritage (CH) portals. In both use cases, a semantic data model for harmonizing and interlinking heterogeneous data from different sources is needed. This paper presents such a data model, Bio CRM, with the following key ideas: 1) The model is a domain specific extension of CIDOC CRM, making it applicable to not only biographical data but to other CH data, too. 2) The model makes a distinction between enduring unary roles of actors, their enduring binary relationships, and perduing events, where the participants can take different roles modeled as a role concept hierarchy. 3) The model can be used as a basis for semantic data validation and enrichment by reasoning. 4) The enriched data conforming to Bio CRM is targeted to be used by SPARQL queries in a flexible ways using a hierarchy of roles in which participants can be involved in events.
Petri Leskinen, Eero Hyvönen and Jouni Tuominen:
Analyzing and Visualizing Prosopographical Linked Data Based on Biographies.
Proceedings of the Second Conference on Biographical Data in a Digital World 2017 (BD2017), vol. 2119, pp. 39-44, CEUR Workshop Proceedings, Linz, Austria, 2018.
bib pdf link This paper shows how faceted search on biographical data can be utilized as a flexible basis for filtering target groups of people and, in particular, how generic data analysis and visualization tools can then be applied for solving prosopographical research questions based on the filtered data. This idea is demonstrated and evaluated in practice by presenting two application case studies: 1) linked data extracted from a printed registry of over 10 000 alumni (1867–1992) of the prominent Finnish high school Norssi, and 2) a knowledge graph extracted from 13 000 short biographies of significant Finnish people (from 3rd century to present times) in the National Biography of Finland. In both cases, the data is enriched by linking their entities with several other external datasets.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori:
Biografiasammon tekoäly yhdistää ja rikastaa suomalaiset elämäkerrat semanttisessa webissä. Aalto-yliopisto, Semanttisen laskennan tutkimusryhmä (SeCo), Nov, 2018.
bib pdf Biografiasampo-järjestelmä käynnistää uuden aikakauden elämäkertakokoelmien julkaisemisessa ja käyttämisessä verkossa. Järjestelmän ydinaineistona on Kansallisbiografia ja muut Suomalaisen Kirjallisuuden Seuran (SKS) ja tieteellisten seurojen toimittamat pienoiselämäkerrat, yhteensä 13 100 elämäntarinaa, joita on kirjoittanut 900 suomalaista tutkijaa. Biografiasammon innovaationa on luoda kieliteknologian, tekoälyn ja semanttisen webin teknologioiden avulla elämäkertojen teksteistä ja niihin eri lähteissä liittyvistä tiedoista tietämysverkko (knowledge graph) ja kansallinen tietoinfrastruktuuri, joka koostuu miljoonista tietojen välisistä yhteyksistä. Tietämysverkko on julkaistu linkitetyn datan palvelussa, jonka varaan on toteutettu seitsemästä sovellusnäkymästä koostuva älykäs, kaikille avoin ja maksuton verkkopalvelu biografiasampo.fi kansalaisten ja digitaalisten ihmistieteiden tutkijoiden käytettäväksi.
Minna Tamper, Petri Leskinen, Kasper Apajalahti and Eero Hyvönen:
Using Biographical Texts as Linked Data for Prosopographical Research and Applications.
Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection. 7th International Conference, EuroMed 2018, Nicosia, Cyprus (Marinos Ioannides, Eleanor Fink, Raffaella Brumana, Petros Patias, Anastasios Doulamis, João Martins and Manolis Wallace (eds.)), pp. 125-137, Springer-Verlag, November, 2018.
bib pdf link
Eero Hyvönen, Petri Leskinen, Minna Tamper, Jouni Tuominen and Kirsi Keravuori:
Semantic National Biography of Finland.
Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018), pp. 372-385, CEUR Workshop Proceedings, Vol-2084, Helsinki, Finland, March, 2018.
bib pdf link This paper presents the vision of publishing and utilizing textual biographies as Linked (Open) Data on the Semantic Web. As a case study, we publish the live stories of the National Biography of Finland, created by the Finnish Literature Society, as semantic, i.e., machine “understandable” metadata in a SPARQL endpoint using the Linked Data Finland (LDF.fi) service. On top of the data service various Digital Humanities applications are built. The applications include searching and studying individual personal histories as well as historical research of groups of persons using methods of prosopography. The biographical data is enriched by extracting events from unstructured and semi-structured texts, and by linking entities internally and to external data sources. A faceted semantic search engine is provided for filtering groups of people from the data for prosopographical research. An extension of the event-based CIDOC CRM ontology is used as the underlying data model, where lives are seen as chains of interlinked events populated from the data of the biographies and additional data sources, such as museum collections, library databases, and archives.
2017
Petri Leskinen, Jouni Tuominen, Erkki Heino and Eero Hyvönen:
An Ontology and Data Infrastructure for Publishing and Using Biographical Linked Data.
Proceedings of the Workshop on Humanities in the Semantic Web (WHiSe II), pp. 15-26., CEUR Workshop Proceedings, Vol. 2014, Vienna, Austria, October, 2017.
bib pdf link This paper describes the ontology model and published datasets of a digitized biographical person register. The applied ontology model is designed to represent people via their enduring roles and perduring lifetime events. The model is designed to support 1) prosopographical Digital Humanities research, 2) linking to resources in semantic Cultural Heritage portals, and 3) semantic data validation and enrichment by using SPARQL queries. The linked data approach enables to enrich a person s biography by interlinking it with space and time related biographical events, persons relating by social contacts or family relations, historical events, and personal achievements.
Eero Hyvönen, Petri Leskinen, Erkki Heino, Jouni Tuominen and Laura Sirola:
Reassembling and Enriching the Life Stories in Printed Biographical Registers: Norssi High School Alumni on the Semantic Web.
Proceedings, Language, Data and Knowledge (LDK 2017), pp. 113-119, Springer-Verlag, Galway, Ireland, June, 2017.
bib pdf link This paper presents the idea to enrich printed biographical person registers with linked data related to events that took place after the register was published. By transforming printed historical documents into structured data, semantic search to written texts can be provided for the reader. Even more importantly, life stories of historical persons can be extended based on data linking by extracting semantic structures from printed texts, and by combining this data with external datasets and data services. Such linking provides an enriched context for prosopographical research on people in the register, as well as an enhanced reading experience for anyone interested in reading the biographies. As a concrete case study, a register 1867–1992 of over 10 000 alumni of the prominent Finnish high school “Norssi” was transformed into RDF, was enriched by data linking, was published as a linked data service, and is provided to end users via a faceted search engine and browser for studying lives of historical persons and for prosopographical research.
2014
Eero Hyvönen, Miika Alonen, Esko Ikkala and Eetu Mäkelä:
Life Stories as Event-based Linked Data: Case Semantic National Biography.
Proceedings of ISWC 2014 Posters & Demonstrations Track, CEUR Workshop Proceedings, October, 2014.
bib pdf link This paper argues, by presenting a case study and a demonstration on the web, that biographies make a promising application case of Linked Data: the reading experience can be enhanced by enriching the biographies with additional life time events, by proving the user with a spatio-temporal context for reading, and by linking the text to additional contents in related datasets.