Finnish National Biographies on the Semantic Web
Paradigm Shift: Biographies as Linked Data
Biographical dictionaries, a historical genre dating back to antiquity, are scholarly resources used by the public and by the academic community alike. Most national biographical dictionaries follow the traditional form of combining a lengthy non-structured
text, often written with authorial individuality and personal insight, with a structure
supplement of basic biographical facts, such as family, education, works, and so on.
Biographies are an invaluable information source for researchers across the disciplines
with an interest in the past.
Biographical on-line collections may contain tens of thousands of short biographies
of historical persons of national importance whose contents are interlinked by historical events, places, acquaintances, family relations, times, objects, traditions, etc. The
Oxford Dictionary of National Biography, with more than 60 000 lives, was first
published on-line in 2004, and since then major biographical dictionaries have opened
their editions on the Web. Other on-line national biographical collections include USA’s
American National Biography, Germany’s Neue Deutsche Biographie, France’s
Nouvelle Biographie Generale, Biography Portal of the Netherlands, BiographyNet,
and Dictionary of Swedish National Biography.
From National Biography of Finland to BiographySampo
The National Biography of Finland, published by the Finnish Literuture Society, started out as an on-line publication in 1997, a
decade before its publication as a 10-volume book series was completed. A selection of
its 6 000 lives was published on-line in Swedish as Biografiskt lexikon för Finland. In
addition to the national Biography of Finland, the Biographical Centre of the Finnish
Literature Society has published several other peer-reviewed biographical collections
on-line, such as the Finnish Business Leaders, the Finnish Clergy (1552–1920), and the
Finnish Generals and Admirals in the Russian Armed Forces (1809–1917).
Even if lots of biographical information is available online for humans to read and
interpret, the information is seldom available as machine readable data for 1) Digital
Humanities research and 2) to be used in Cultural Heritage (CH) portals, such as Europeana
and Digital Public Library of America, or in CH applications for the public.
Furthermore, the information is distributed in different national data silos using het-
erogeneous formats and is written in different languages. This makes aggregation and
reuse of biographical data challenging.
BiographySampo aims to make a paradigm shift in publishing biographies on the web.
The system is based on a Linked Data service on top of which several
search and browsing applications as well as tooling for data analysis, network analysis, and
visualizations are provided for biographical research on individual persons as well as prosopographical research on groups of people.
The data was created by extracting knowledge from the
underlying bioraphical texts, over 13 000 short biographies published by the Finnish Literature Society,
using language technology, and by eriching the data by linking it to various external
biographical databases, Wikipedia/Wikidata, collection databases of memory organizations, semantic web data services etc.
BiographySampo is the next step in our series of semantic "Sampo" portals based on Linked Data, including
and TravelSampo (2011), and
and WarSampo (2015).
These applications have been used by hundreds of thousands of people on the Semantic Web.
The vision and major functionalities of BiographySampo are presented in the short video below:
Examples of Using BiograhySampo
See use case page with screenshots in English using Google Translate
Data Service and Semantic Portal
The idea and funtionalities of BiographySampo are overviewed in the article
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen, and Kirsi Keravuori:
BiographySampo - Publishing and Enriching Biographies on the Semantic Web for Digital Humanities Research
and in the publications below.
BiographySampo data service and the semantic portal on top of it was published on Sept 27, 2018 at
More information is available in the publications below, and at the Finnish homepage (with Google translations in English).
Join open Facebook group Semantic Biographies
Prof., Director Eero Hyvönen, Aalto University, Semantic Computing Research Group (SeCo), and University of Helsinki, Helsinki Centre for Digital Humanities (HELDIG)
Dr. Kirsi Keravuori, Finnish Literature Society, Biography Center
Other members of the BiographySampo team:
MSc Petri Leskinen, Aalto University, Semantic Computing Research Group (SeCo)
MSc Minna Tamper, Aalto Univeristy, Semantic Computing Research Group (SeCo)
Dr. Jouni Tuominen, University of Helsinki, Helsinki Centre for Digital Humanities (HELDIG)
Heikki Rantala, Aalto Univeristy, Semantic Computing Research Group (SeCo), and University of Helsinki
Mikko Koho, Lia Gasbarra, Jouni Tuominen, Heikki Rantala, Ilkka Jokipii and Eero Hyvönen: AMMO Ontology of Finnish Historical Occupations
. 2019. bib pdf
This paper introduces AMMO Ontology of Finnish Historical Occupations. AMMO is based on thousands of occupation labels extracted from three Finnish military historical datasets of the early 20th century: the first consists of the ca. 40 000 war-related death records around the time of the Finnish Civil War (1914–1922); the second consists of the ca. 95 000 death records of Finnish soldiers in the Winter War and Continuation War (1939–1944); the third contains the ca. 4500 records of Finnish prisoners of war in the Soviet Union during the WW2. Our goal from a Digital Humanities perspective is to use AMMO to study military history and these datasets based on the occupation and social status of the soldiers. AMMO will also be used as a component for faceted search and semantic recommendation in two semantic portals for Finnish military history. AMMO is aligned with the international historical occupation classification HISCO and with a modern Finnish occupational classification for international and national interoperability. The ontology is published as Linked Open Data in an ontology service.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori: BiographySampo – A Paradigm Shift for Publishing and Using Biography Collections on the Semantic Web
. November, 2018. bib pdf
This paper argues for making a paradigm shift in publishing and using biographical dictionaries on the web, based on Linked Data. Firstly, a biographical dictionary on the web should provide the end user with an enhanced reading experience of biographies by enriching them with data linking and reasoning. Secondly, the web publication should include not only biographies for humans to read but also versatile tooling for 1) biographical research of individual persons as well as for 2) prosopographical research on groups of people. To support these arguments, we present the designing principles and the implementation of the semantic portal ”BiographySampo – Finnish Life Stories on the Semantic Web” especially from the end user’s point of view. The system is based on a Linked Data service and knowledge graph extracted automatically from a collection of 13 100 textual biographies, written by 900 researchers. The texts are enriched with data linking to 16 external data sources and by harvesting external collection data from libraries, museums, and archives. The portal, consisting of seven different interlinked application perspectives, was released on September 27, 2018, for free public use for Digital Humanities researchers and the general public.
Lia Gasbarra, Mikko Koho, Ilkka Jokipii, Heikki Rantala and Eero Hyvönen: An Ontology of Finnish Historical Occupations
. Proceedings of the 16th Extended Semantic Web Conference (ESWC 2019), Posters & demonstrations
, Springer, Portoroz, Slovenia, June, 2019. bib pdf
Historical datasets often impose the need to study groups of people based on occupation or social status. This paper presents first results in creating an ontology of historical Finnish occupations, AMMO, that enables selection of groups of people based on their occupation, occupational groups, or socioeconomic class. AMMO is linked to the international historical occupation classification HISCO and to a modern Finnish occupational classification for interoperability. AMMO will be used as a component in two semantic portals for Finnish war history.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori: Demonstrating BiographySampo in Solving Digital Humanities Research Problems in Biography and Prosopography
. The Fourth Digital Humanities in the Nordic Countries 2019 (DHN2019)
, CEUR Workshop Proceedings, Copenhagen, Denmark, March, 2019. bib pdf link
Heikki Rantala: Yhteyshaku semanttisessa webissä
. MSc Thesis (in Finnish), University of Helsinki, Department of Computer Science, January, 2019. bib pdf
Tavallisessa haussa etsitään yksilöitä, kuten henkilöitä tai paikkoja. Joissain tilanteissa esimerkiksi historian tutkija voi olla kiinnostunut myös etsimään yhteyksiä henkilöiden ja paikkojen välillä. Tässä työssä esitetään metodi tällaisen yhteyshaun toteuttamiseksi käyttäen semanttisen webin sisältämää avointa dataa. Työssä muodostettiin graafi, joka sisältää kuvauksia Suomen kulttuu- rihistorian henkilöiden ja paikkojen välisistä kiinnostaviksi arvioiduista yhteyksistä. Graafi luotiin SPARQL CONSTRUCT -kyselyillä. Yhteyksien hakemista varten luotiin web-sovellus, joka hyödyntää fasettihakua. Tarvittavien SPARQL CONSTRUCT -kyselyjen luominen ei osoittautunut erityisen hankalaksi, mutta niiden soveltaminen yleisemmin eri aineistoihin vaatii jonkin verran työtä. Yhteyksien fasettihaku osoittautui mielenkiintoiseksi. Fasettihaku mahdollistaa haun tarkentamisen askel kerrallaan. Lisäksi yhteyksien suhteellisia määriä on mahdollista vertailla erilaisten rajausten mukaan. Tämä tarjoaa aineistoon uusia näkökulmia.
Petri Leskinen, Eero Hyvönen and Jouni Tuominen: Analyzing and Visualizing Prosopographical Linked Data Based on Biographies
. Proceedings of the Second Conference on Biographical Data in a Digital World 2017 (BD2017)
, vol. 2119, pp. 39-44, CEUR Workshop Proceedings, Linz, Austria, 2018. bib pdf link
This paper shows how faceted search on biographical data can be utilized as a flexible basis for filtering target groups of people and, in particular, how generic data analysis and visualization tools can then be applied for solving prosopographical research questions based on the filtered data. This idea is demonstrated and evaluated in practice by presenting two application case studies: 1) linked data extracted from a printed registry of over 10 000 alumni (1867–1992) of the prominent Finnish high school Norssi, and 2) a knowledge graph extracted from 13 000 short biographies of significant Finnish people (from 3rd century to present times) in the National Biography of Finland. In both cases, the data is enriched by linking their entities with several other external datasets.
Jouni Tuominen, Eero Hyvönen and Petri Leskinen: Bio CRM: A Data Model for Representing Biographical Data for Prosopographical Research
. Proceedings of the Second Conference on Biographical Data in a Digital World 2017 (BD2017)
, vol. 2119, pp. 59-66, CEUR Workshop Proceedings, Linz, Austria, 2018. bib pdf link
Biographies make a promising application case of Linked Data: they can be used, e.g., as a basis for Digital Humanities research in prosopography and as a key data and linking resource in semantic Cultural Heritage (CH) portals. In both use cases, a semantic data model for harmonizing and interlinking heterogeneous data from different sources is needed. This paper presents such a data model, Bio CRM, with the following key ideas: 1) The model is a domain specific extension of CIDOC CRM, making it applicable to not only biographical data but to other CH data, too. 2) The model makes a distinction between enduring unary roles of actors, their enduring binary relationships, and perduing events, where the participants can take different roles modeled as a role concept hierarchy. 3) The model can be used as a basis for semantic data validation and enrichment by reasoning. 4) The enriched data conforming to Bio CRM is targeted to be used by SPARQL queries in a flexible ways using a hierarchy of roles in which participants can be involved in events.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen and Kirsi Keravuori: Biografiasammon tekoäly yhdistää ja rikastaa suomalaiset elämäkerrat semanttisessa webissä
. Aalto-yliopisto, Semanttisen laskennan tutkimusryhmä (SeCo), Nov, 2018. bib pdf
Biografiasampo-järjestelmä käynnistää uuden aikakauden elämäkertakokoelmien julkaisemisessa ja käyttämisessä verkossa. Järjestelmän ydinaineistona on Kansallisbiografia ja muut Suomalaisen Kirjallisuuden Seuran (SKS) ja tieteellisten seurojen toimittamat pienoiselämäkerrat, yhteensä 13 100 elämäntarinaa, joita on kirjoittanut 900 suomalaista tutkijaa. Biografiasammon innovaationa on luoda kieliteknologian, tekoälyn ja semanttisen webin teknologioiden avulla elämäkertojen teksteistä ja niihin eri lähteissä liittyvistä tiedoista tietämysverkko (knowledge graph) ja kansallinen tietoinfrastruktuuri, joka koostuu miljoonista tietojen välisistä yhteyksistä. Tietämysverkko on julkaistu linkitetyn datan palvelussa, jonka varaan on toteutettu seitsemästä sovellusnäkymästä koostuva älykäs, kaikille avoin ja maksuton verkkopalvelu biografiasampo.fi kansalaisten ja digitaalisten ihmistieteiden tutkijoiden käytettäväksi.
Minna Tamper, Petri Leskinen, Kasper Apajalahti and Eero Hyvönen: Using Biographical Texts as Linked Data for Prosopographical Research and Applications
. Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection. 7th International Conference, EuroMed 2018, Nicosia, Cyprus
, Springer-Verlag, November, 2018. bib pdf
Eero Hyvönen, Petri Leskinen, Minna Tamper, Jouni Tuominen and Kirsi Keravuori: Semantic National Biography of Finland
. Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018)
, pp. 372-385, CEUR Workshop Proceedings, Vol-2084, Helsinki, Finland, March, 2018. bib pdf link
This paper presents the vision of publishing and utilizing textual biographies as Linked (Open) Data on the Semantic Web. As a case study, we publish the live stories of the National Biography of Finland, created by the Finnish Literature Society, as semantic, i.e., machine “understandable” metadata in a SPARQL endpoint using the Linked Data Finland (LDF.fi) service. On top of the data service various Digital Humanities applications are built. The applications include searching and studying individual personal histories as well as historical research of groups of persons using methods of prosopography. The biographical data is enriched by extracting events from unstructured and semi-structured texts, and by linking entities internally and to external data sources. A faceted semantic search engine is provided for filtering groups of people from the data for prosopographical research. An extension of the event-based CIDOC CRM ontology is used as the underlying data model, where lives are seen as chains of interlinked events populated from the data of the biographies and additional data sources, such as museum collections, library databases, and archives.
Petri Leskinen, Jouni Tuominen, Erkki Heino and Eero Hyvönen: An Ontology and Data Infrastructure for Publishing and Using Biographical Linked Data
. Proceedings of the Workshop on Humanities in the Semantic Web (WHiSe II)
, CEUR Workshop Proceedings, Vienna, Austria, October, 2017. bib pdf
This paper describes the ontology model and published datasets of a digitized biographical person register. The applied ontology model is designed to represent people via their enduring roles and perduring lifetime events. The model is designed to support 1) prosopographical Digital Humanities research, 2) linking to resources in semantic Cultural Heritage portals, and 3) semantic data validation and enrichment by using SPARQL queries. The linked data approach enables to enrich a person s biography by interlinking it with space and time related biographical events, persons relating by social contacts or family relations, historical events, and personal achievements.
Eero Hyvönen, Petri Leskinen, Erkki Heino, Jouni Tuominen and Laura Sirola: Reassembling and Enriching the Life Stories in Printed Biographical Registers: Norssi High School Alumni on the Semantic Web
. Proceedings, Language, Technology and Knowledge (LDK 2017)
, pp. 113-119, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper presents the idea to enrich printed biographical person registers with linked data related to events that took place after the register was published. By transforming printed historical documents into structured data, semantic search to written texts can be provided for the reader. Even more importantly, life stories of historical persons can be extended based on data linking by extracting semantic structures from printed texts, and by combining this data with external datasets and data services. Such linking provides an enriched context for prosopographical research on people in the register, as well as an enhanced reading experience for anyone interested in reading the biographies. As a concrete case study, a register 1867–1992 of over 10 000 alumni of the prominent Finnish high school “Norssi” was transformed into RDF, was enriched by data linking, was published as a linked data service, and is provided to end users via a faceted search engine and browser for studying lives of historical persons and for prosopographical research.
Eero Hyvönen, Miika Alonen, Esko Ikkala and Eetu Mäkelä: Life Stories as Event-based Linked Data: Case Semantic National Biography
. Proceedings of ISWC 2014 Posters & Demonstrations Track
, CEUR Workshop Proceedings, October, 2014. bib pdf link
This paper argues, by presenting a case study and a demonstration on the web, that biographies make a promising application case of Linked Data: the reading experience can be enhanced by enriching the biographies with additional life time events, by proving the user with a spatio-temporal context for reading, and by linking the text to additional contents in related datasets.