» back to normal layout
WarSampo:
Finnish World War II on the Semantic Web

Goal: Understanding History and Promoting Peace

According to Georg Wilhelm Friedrich Hegel we learn from history that we learn nothing from history. Hopefully this is not the case for the Second World War (WW2), now that fighting has started again even within the borders of Europe in Ukraine. One way to promote peace is to make reliable data about the war openly available for everybody to learn.

WarSampo is an initiative that aims at this goal by harmonizing and publishing large heterogeneous sets of data about the WW2 in Finland as Linked Open Data (LOD). Application demonstrators are built that provide different perspectives to war history, for both historians and the public. The data covers the Winter War 1939-1940 against the Soviet attack, the Continuation War 1941-1944 where the occupied areas of the Winter War were temporarily regained, and the Lapland War 1944-1945, where the Finns pushed the German troops away from Lapland.

Video about WarSampo (2021):

The WarSampo Semantic Portal is the next step in our series of "Sampo" portals based on Linked Data, including CultureSampo, BookSampo, and TravelSampo, and continues our earlier works on modeling the First World War as Linked Data.

The WarSampo initiative started in autumn 2014. In summer 2017 in Venice WarSampo won the international LODLAM Technical Challenge Open Data Prize.

Figure: Mäkiluoto artillery fires at the Battle of Hanko in 1942. Finnish Wartime Photograph Archive, Defence Forces.

Prototype Demonstrator Online since Nov 9th, 2015

Try the WarSampo Semantic Portal online at http://www.sotasampo.fi/en/.

See a video about the Warsampo initiative and portal:

The videos below show how different perspectives of the WarSampo portal are used in more detail:

Video Presentation about the WarSampo initiative

"WarSampo Data Service and Semantic Portal for Publishing Linked Open Data about the Second World War History". Recorded presentation at the Extended Semantic Web Conference 2016 (ESWC 2016), courtesy of Videolectures.net.

Modeling War History as Linked Data

Source Data

The following table presents the source datasets, which are obtained from a network of collaborating and data publishing organizations, such as the National Archives of Finland, the Finnish Defence Forces, and the National Land Survey of Finland. Some source datasets have been created as part of the WarSampo initiative and related research.

Table: Source datasets of WarSampo, grouped by providing organization.

Source Dataset Providing Organization Used Content Source Format
Casualties of WW2 The National Archives of Finland 94 700 person records spreadsheet
War diaries The National Archives of Finland 26 400 war diaries with metadata, 9850 units, and 12 people spreadsheet
Senate atlas The National Archives of Finland 414 historical maps of Finland digital images
Municipalities The National Archives of Finland 625 wartime municipalities digital text
Organization cards The National Archives of Finland 132 military units, 279 people, and 642 battles digital images, PDF documents
Units of The Finnish Army 1941-1945 The National Archives of Finland 8810 military units digital text, PDF document
Wartime photographs The Finnish Defence Forces 164 000 photos with metadata, 1740 people spreadsheet, API access
Kansa Taisteli magazine articles The Association for Military History in Finland, Bonnier Publications 3360 articles by war veterans spreadsheet, PDF documents
Karelian places The National Land Survey of Finland 32 400 places of the annexed Karelia spreadsheet
Karelian maps The National Land Survey of Finland 47 wartime maps of Karelia digital images
Finnish Place Name Register The National Land Survey of Finland 798 000 contemporary place names XML
National Biography The Finnish Literature Society 699 biographies spreadsheet
War cemeteries The Central Organization of Finnish Camera Clubs 672 cemeteries & 2450 photographs spreadsheet, digital images
Prisoners of war The National Prisoners of War Project 4450 person records spreadsheet
Wikipedia Wikimedia Foundation 3010 people, 255 military units API, web pages
Knights of the Mannerheim Cross Knights of the Mannerheim Cross Foundation 191 people, 1120 medal awardings API, web pages
Military historical literature - 1050 war events, 2900 military units, 585 people printed text
Finnish Spatio-Temporal Ontology Aalto University 488 polygons of wartime municipalities RDF
AMMO Ontology of Finnish Historical Occupations Aalto University 3090 occupational labels RDF

Metadata models

The CIDOC Conceptual Reference Model (CRM) is used as the harmonizing basis for modeling data, with events providing the semantic glue for data linking. Our earlier data model for WW1 is used and extended as the metadata model to start with.

Domain Ontologies

The data is annotated using a set of domain ontologies, including:

The data and domain ontologies are published as a harmonized Knowledge Graph, that forms the basis of the WarSampo Semantic Portal and its perspectives. The idea of the portal is to provide a variety of different kind of perspectives to war data. Most datasets have their own perspective, where the user can first search data of interest and then get linked data related to the resources found. The WarSampo Knowledge Graph is published under the open CC BY license, and it is available with the documentation at http://www.ldf.fi/dataset/warsa.

Data service

WarSampo Knowledge Graph is hosted on the "7-star" Linked Data Finland platform, based on Apache Jena Fuseki with a Varnish Cache for serving Linked Open Data.

Follow-up Projects on Developing WarSampo

  1. In 2016, a new 7. application perspective based on the 160 000 authentic WW2 photos provided by the Defense Forces of Finland was published: Photographs of Winter War, Continuation War, and Lapland War.
  2. Another new 8. application perspective about the war cemeteries War Cemeteries in Finland was developed in a separate project to celebrate the centennial of Finnish independence in 2017.
  3. Project Finnish prisoner of war in Soviet Union 1939-1945 produced yet anotgher new 9. application perspective Prisoners of War 1939-45 in WarSampo for the 80-year memory of Winter War in 2019.
  4. The latest project on WarSampo is: WarSampo: Citizen Science.

Collaboration Network and Funding

Our collaborator and data provider network consists of various organizations and data publishers on the web, including:

The WarSampo initiative has been funded by the following organizations:

WarSampo was part of the Centennary of Finland's Independence 2017 programme coordinated by the Prime Minister's Office.

WarSampo was one of the Finnish proposals for the EU Prize of Cultural Heritage / Europa Nostra Awards.

WarSampo was awarded the Open Data prize in the 2017 LODLAM Challenge.

More Information

More information is available in the publications below, at the Finnish homepage, and in the initiative description (PDF, in Finnish).

Contact Person

Prof. Eero Hyvönen, Aalto University


Publications

2024

Petri Leskinen: Modeling and Using Biographical Linked Data for Prosopographical Data Analysis. Dissertation, Aalto University, School of Science, Department of Computer Science, October, 2024. bib pdf
Eero Hyvönen: Military History on the Semantic Web: Lessons Learned from Developing Three In-use Linked Open Data Services and Semantic Portals for Digital Humanities. Intelligent Computing for Cultural Heritage: Global Developments and China´s Innovations, Francis & Taylor, Routledge, July, 2024. Book chapter. bib pdf link
Eero Hyvönen: How to Create a National Cross-domain Ontology and Linked Data Infrastructure and Use It on the Semantic Web. Semantic Web - Interoperability, Usability, Applicability, IOS Press, 2024. DOI: 10.3233/SW-243468. bib pdf link

2023

Mikko Koho and Eero Hyvönen: Studying Occupations and Social Measures of Perished Soldiers in WarSampo Linked Open Data. Biographical Data in a Digital World 2022 (BD 2022), Tokyo, CEUR Workshop Sproceedings, August, 2023. bib pdf
Eero Hyvönen: Creating and Using a National Linked Open Data Infrastructure for Cultural Heritage Applications and Digital Humanities Research: Lessons Learned. DARIAH Annual Event 2023, Budapest, Hungary, abstracts of papers, DARIAH-EU, June, 2023. bib link
Eero Hyvönen: How to Create a National Cross-domain Ontology and Linked Data Infrastructure and Use It on the Semantic Web. Programming and Data Infrastructure in Digital Humanities, Book of Abstracts, pp. 7, High Performance Computing Centre, University of Évora, Portugal, March, 2023. bib link
Eero Hyvönen: Digital Humanities on the Semantic Web: Sampo Model and Portal Series. Semantic Web – Interoperability, Usability, Applicability, vol. 14, no. 4, pp. 729-744, IOS Press, 2023. bib pdf link

2022

Joonas Kesäniemi, Mikko Koho, Esko Ikkala and Eero Hyvönen: Using Wikibase for Managing Cultural Heritage Linked Open Data Based on CIDOC CRM. New Trends in Database and Information Systems, pp. 542-549, Springer International Publishing, August, 2022. bib pdf link
Eero Hyvönen, Esko Ikkala, Mikko Koho, and Rafael Leal, Heikki Rantala and Minna Tamper: How to Search and Contextualize Scenes inside Videos for Enriched Watching Experience: Case Stories of the Second World War Veterans. The Semantic Web: ESWC 2022 Satellite Events, Lecture Notes in Computer Science, vol. 13384, pp. 163-167, Springer, July, 2022. bib pdf link
Mikko Koho, Esko Ikkala and Eero Hyvönen: Reassembling the Lives of Finnish Prisoners of the Second World War on the Semantic Web. Proceedings of the Third Conference on Biographical Data in a Digital World (BD 2019), pp. 31-39, CEUR Workshop Proceedings, June, 2022. bib pdf link
This paper presents first results of a new, ninth application perspective for the semantic portal WarSampo - Finnish WW2 on the Semantic Web, based on a database of ca. 4450 Finnish prisoners of war in the Soviet Union. Our key idea is to reassemble the life of each prisoner of war by using Linked Data, based on information about the person in different data sources. Using the enriched aggregated data, a biographical global home page for each prisoner of war can be created, that is more complete than information in individual data sources. The application perspective is targeted to researchers of military history, to study and analyze the data in order to form new research questions or hypotheses, as well as to public in the large looking for information e.g., about their relatives that were captured as prisoners of war. Employing the faceted search of the application perspective, prosopographical research on subgroups of prisoners is possible.
Mikko Koho, Heikki Rantala and Eero Hyvönen: Digital Humanities and Military History: Analyzing Casualties of the WarSampo Knowledge Graph. DHNB 2022 The 6th Digital Humanities in Nordic and Baltic Countries Conference (Karl Berglund, Matti La Mela and Inge Zwart (eds.)), vol. 3232, CEUR Workshop Proceedings, Uppsala, Sweden, March, 2022. bib pdf link
Joonas Kesäniemi, Mikko Koho, Esko Ikkala and Eero Hyvönen: Using Wikibase for Managing Cultural Heritage Linked Open Data Based on CIDOC CRM. 6th Digital Humanities in Nordic and Baltic Countries Conference, poster paper, pp. 74-75, March, 2022. Book of Abstracts. bib link

2021

Eero Hyvönen: Sammon taontaa semanttisessa webissä (Forging Sampos on the Semantic Web). Tekniikan Waiheita, vol. 39, no. 2, pp. 87-105, Tekniikan Historian Seura ry, July, 2021. bib pdf link
Mikko Koho, Esko Ikkala, Petri Leskinen, Minna Tamper, Jouni Tuominen and Eero Hyvönen: WarSampo Knowledge Graph: Finland in the Second World War as Linked Open Data. Semantic Web – Interoperability, Usability, Applicability, vol. 12, no. 2, pp. 265-278, January, 2021. bib pdf link
The Second World War (WW2) is arguably the most devastating catastrophe of human history, a topic of great interest to not only researchers but the general public. However, data about the Second World War is heterogeneous and distributed in various organizations and countries making it hard to utilize. In order to create aggregated global views of the war, a shared ontology and data infrastructure is needed to harmonize information in various data silos. This makes it possible to share data between publishers and application developers, to support data analysis in Digital Humanities research, and to develop data-driven intelligent applications. As a first step towards these goals, this article presents the WarSampo knowledge graph (KG), a shared semantic infrastructure, and a Linked Open Data (LOD) service for publishing data about WW2, with a focus on Finnish military history. The shared semantic infrastructure is based on the idea of representing war as a spatio-temporal sequence of events that soldiers, military units, and other actors participate in. The used metadata schema is an extension of CIDOC CRM, supplemented by various military historical domain ontologies. With an infrastructure containing shared ontologies, maintaining the interlinked data brings upon new challenges, as one change in an ontology can propagate across several datasets that use it. To support sustainability, a repeatable automatic data transformation and linking pipeline has been created for rebuilding the whole WarSampo KG from the individual source datasets. The WarSampo KG is hosted on a data service based on W3C Semantic Web standards and best practices, including content negotiation, SPARQL API, download, automatic documentation, and other services supporting the reuse of the data. The WarSampo KG, a part of the international LOD Cloud and totalling ca. 14 million triples, is in use in nine end-user application views of the WarSampo portal, which has had over 400 000 end users since its opening in 2015.

2020

Eero Hyvönen: Semantic Sampo Portals for Digital Humanities Based on a National Linked Open Data Infrastructure. 2020. White paper, Aalto University, Semantic Computing Research Group (SeCo). bib pdf
Eero Hyvönen: Sampo Model and Semantic Portals for Digital Humanities on the Semantic Web. DHN 2020 Digital Humanities in the Nordic Countries. Proceedings of the Digital Humanities in the Nordic Countries 5th Conference, pp. 373-378, CEUR Workshop Proceedings, vol. 2612, Riga, Latvia, October, 2020. bib pdf link
Mikko Koho, Petri Leskinen and Eero Hyvönen: Integrating Historical Person Registers as Linked Open Data in the WarSampo Knowledge Graph. Semantic Systems. In the Era of Knowledge Graphs. SEMANTiCS 2020 (Eva Blomqvist, Paul Groth, Victor de Boer, Tassilo Pellegrini, Mehwish Alam, Tobias Käfer, Peter Kieseberg, Sabrina Kirrane, Albert Meroño-Peñuela and Harshvardhan J. Pandit (eds.)), Lecture Notes in Computer Science, vol. 12378, pp. 118-126, Springer, Cham, Amsterdam, The Netherlands, October, 2020. bib pdf link
Semantic data integration from heterogeneous, distributed data silos enables Digital Humanities research and application development employing a larger, mutually enriched and interlinked knowledge graph. However, data integration is challenging, involving aligning the data models and reconciling the concepts and named entities, such as persons and places. This paper presents a record linkage process to reconcile person references in different military historical person registers with structured metadata. The information about persons is aggregated into a single knowledge graph. The process was applied to reconcile three person registers of the popular semantic portal WarSampo -- Finnish World War 2 on the Semantic Web . The registers contain detailed information about some 100,000 people and are individually maintained by domain experts. Thus, the integration process needs to be automatic and adaptable to changes in the registers. An evaluation of the record linkage results is promising and provides some insight into military person register reconciliation in general.
Mikko Koho: Representing, Using, and Maintaining Military Historical Linked Data on the Semantic Web. Dissertation, Aalto University, School of Science, Department of Computer Science, May, 2020. bib pdf link
Eero Hyvönen: Tekoäly ja semanttinen web tarjoavat huikeat mahdollisuudet suurten aineistojen hallintaan. Kansallisarkiston strategia 2025, näkökulmia tulevaan (Jussi Nuorteva, Päivi Happonen (ed.)), pp. 20-21, Kansallisarkisto, Helsinki, May, 2020. bib link

2019

Mikko Koho, Erkki Heino, Petri Leskinen, Esko Ikkala, Minna Tamper, Kasper Apajalahti, Jouni Tuominen, Eetu Mäkelä and Eero Hyvönen: WarSampo Knowledge Graph. Zenodo, October, 2019. Dataset. bib link
WarSampo Knowledge Graph includes harmonized data of different kinds concerning the Second World War in Finland, separated in different subgraphs representing events, actors, places, photographs, and other aspects and documentation of the war. The data covers the Winter War 1939-1940 against the Soviet attack, the Continuation War 1941-1944 where the occupied areas of the Winter War were temporarily regained, and the Lapland War 1944-1945, where the Finns pushed the German troops away from Lapland.
Eero Hyvönen: Linked Data in Use: Sampo Portals on the Semantic Web. EuropaNow, Council for European Studies (CES), Columbia University, September, 2019. bib pdf link
Lia Gasbarra, Mikko Koho, Ilkka Jokipii, Heikki Rantala and Eero Hyvönen: An Ontology of Finnish Historical Occupations. The Semantic Web: ESWC 2019 Satellite Events (Hitzler, Pascal, Kirrane, Sabrina, Hartig, Olaf, de Boer, Victor, Vidal, Maria-Esther, Maleshkova, Maria, Schlobach, Stefan, Hammar, Karl, Lasierra, Nelia, Stadtmüller, Steffen, Hose, Katja and Verborgh, Ruben (eds.)), Lecture Notes in Computer Science, pp. 64-68, Springer, Cham, Portoroz, Slovenia, June, 2019. bib pdf link
Historical datasets often impose the need to study groups of people based on occupation or social status. This paper presents first results in creating an ontology of historical Finnish occupations, AMMO, that enables selection of groups of people based on their occupation, occupational groups, or socioeconomic class. AMMO is linked to the international historical occupation classification HISCO and to a modern Finnish occupational classification for interoperability. AMMO will be used as a component in two semantic portals for Finnish war history.
Mikko Koho, Lia Gasbarra, Jouni Tuominen, Heikki Rantala, Ilkka Jokipii and Eero Hyvönen: AMMO Ontology of Finnish Historical Occupations. Proceedings of the First International Workshop on Open Data and Ontologies for Cultural Heritage (ODOCH 19) (Antonella Poggi (ed.)), vol. 2375, pp. 91-96, CEUR Workshop Proceedings, Rome, Italy, June, 2019. bib pdf link
This paper introduces AMMO Ontology of Finnish Historical Occupations. AMMO is based on thousands of occupation labels extracted from three Finnish military historical datasets of the early 20th century: the first consists of the ca. 40 000 war-related death records around the time of the Finnish Civil War (1914–1922); the second consists of the ca. 95 000 death records of Finnish soldiers in the Winter War and Continuation War (1939–1944); the third contains the ca. 4500 records of Finnish prisoners of war in the Soviet Union during the WW2. Our goal from a Digital Humanities perspective is to use AMMO to study military history and these datasets based on the occupation and social status of the soldiers. AMMO will also be used as a component for faceted search and semantic recommendation in two semantic portals for Finnish military history. AMMO is aligned with the international historical occupation classification HISCO and with a modern Finnish occupational classification for international and national interoperability. The ontology is published as Linked Open Data in an ontology service.

2018

Mikko Koho, Esko Ikkala, Erkki Heino and Eero Hyvönen: Maintaining a Linked Data Cloud and Data Service for Second World War History. Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection. 7th International Conference, EuroMed 2018, Nicosia, Cyprus, vol. 11196, Springer-Verlag, October-November, 2018. bib pdf link
Mikko Koho, Esko Ikkala and Eero Hyvönen: How to Maintain a Linked Data Cloud in a Deployed Semantic Portal. Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks, CEUR Workshop Proceedings, Monterey, California, USA, October, 2018. Vol 2180. bib pdf link
Petri Leskinen, Goki Miyakita, Mikko Koho and Eero Hyvönen: Combining Faceted Search with Data-analytic Visualizations on Top of a SPARQL Endpoint. Proceedings of VOILA 2018, Monterey, California. CEUR Workshop Proceedings, Vol. 2187, October, 2018. bib pdf
Esko Ikkala, Eero Hyvönen and Jouni Tuominen: An Ontology of World War II Places for Linking and Enriching Heterogeneous Historical Data Sources. Abstracts, 17th International Conference of Historical Geographers (ICHG 2018), No. 194, Warsaw, Poland, July, 2018. bib pdf
Mikko Koho, Erkki Heino, Esko Ikkala, Eero Hyvönen, Reijo Nikkilä, Tiia Moilanen, Katri Miettinen and Pertti Suominen: Integrating Prisoners of War Dataset into the WarSampo Linked Data Infrastructure. Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018), CEUR Workshop Proceedings, Helsinki, Finland, March, 2018. Vol 2084. bib pdf link
One of the great promises of Linked Data and the Semantic Web standards is to provide a shared data infrastructure into which more and more data can be imported and aligned, forming a sustainable, ever growing knowledge graph or linked data cloud, Web of Data. This paper studies and evaluates this idea in the context of the WarSampo Linked Data cloud, providing an infrastructure for data related to the Second World War in Finland. As a case study, a new database of prisoners of war with related contents is transformed into linked data and integrated into WarSampo. Lessons learned are discussed in relation to using traditional data publishing approaches.

2017

Mikko Koho, Eero Hyvönen, Erkki Heino, Jouni Tuominen, Petri Leskinen and Eetu Mäkelä: Linked Death - Representing, Publishing, and Using Second World War Death Records as Linked Open Data. The Semantic Web: ESWC 2017 Satellite Events (Eva Blomqvist, Katja Hose, Heiko Paulheim, Agnieszka Ławrynowicz, Fabio Ciravegna and Olaf Hartig (eds.)), pp. 369-383, Springer, Cham, 2017. bib pdf link
War history of the Second World War (WW2), humankind’s largest disaster, is of great interest to both laymen and researchers. Most of us have ancestors and relatives who participated in the war, and in the worst case got killed. Researchers are eager to find out what actually happened then, and even more importantly why, so that future wars could perhaps be prevented. The darkest data of war history are casualty records—from such data we could perhaps learn most about the war. This paper presents a model and system for representing death records as linked data, so that 1) citizens could find out more easily what happened to their relatives during WW2 and 2) digital humanities (DH) researchers could (re)use the data easily for research.
Esko Ikkala, Mikko Koho, Erkki Heino, Petri Leskinen, Eero Hyvönen and Tomi Ahoranta: Prosopographical Views to Finnish WW2 Casualties Through Cemeteries and Linked Open Data. Proceedings of the Workshop on Humanities in the Semantic Web (WHiSe II), CEUR Workshop Proceedings, Vienna, Austria, October, 2017. bib pdf link
This paper presents an application for studying the death records of WW2 casualties from a prosopograhical perspective, provided by the various local military cemeteries where the dead were buried. The idea is to provide the end user with a global visual map view on the places in which the casualties were buried as well as with a local historical perspective on what happened to the casualties that lay within a particular cemetery of a village or town. Plenty of data exists about the Second World War (WW2), but the data is typically archived in unconnected, isolated silos in different organizations. This makes it difficult to track down, visualize, and study information that is contained within multiple distinct datasets. In our work, this problem is solved using aggregated Linked Open Data provided by the WarSampo Data Service and SPARQL endpoint.
Petri Leskinen, Mikko Koho, Erkki Heino, Minna Tamper, Esko Ikkala, Jouni Tuominen, Eetu Mäkelä and Eero Hyvönen: Modeling and Using an Actor Ontology of Second World War Military Units and Personnel. Proceedings of the 16th International Semantic Web Conference (ISWC 2017) (Claudia d Amato, Miriam Fernandez, Valentina Tamma, Freddy Lecue, Philippe Cudré-Mauroux, Juan Sequeda, Christoph Lange and Jeff Heflin (eds.)), pp. 280-296, Springer-Verlag, Vienna, Austria, October, 2017. bib pdf link
This paper presents a model for representing historical military personnel and army units, based on large datasets about World War II in Finland. The model is in use in WarSampo data service and semantic portal, which has had tens of thousands of distinct visitors. A key challenge is how to represent ontological changes, since the ranks and units of military personnel, as well as the names and structures of army units change rapidly in wars. This leads to serious problems in both search as well as data linking due to ambiguity and homonymy of names. In our solution, actors are represented in terms of the events they participated in, which facilitates disambiguation of personnel and units in different spatio-temporal contexts. The linked data in the WarSampo Linked Open Data cloud and service has ca. 9 million triples, including actor datasets of ca. 100 000 soldiers and ca. 16 100 army units. To test the model in practice, an application for semantic search and recommending based on data linking was created, where the spatio-temporal life stories of individual soldiers can be reassembled dynamically by linking data from different datasets. An evaluation is presented showing promising results in terms of linking precision.
Eero Hyvönen, Erkki Heino, Petri Leskinen, Esko Ikkala, Mikko Koho, Minna Tamper, Jouni Tuominen and Eetu Mäkelä: WarSampo: Publishing and Using Linked Open Data about the Second World War. EuropeanaTech Insight, no. 7, Europeana, September, 2017. bib pdf link
The article overviews the system WarSampo – Finnish World War 2 on the Semantic Web, the winner of the LODLAM Challenge 2017 Open Data Prize on June 29 in Venice, Italy.
Erkki Heino, Minna Tamper, Eetu Mäkelä, Petri Leskinen, Esko Ikkala, Jouni Tuominen, Mikko Koho and Eero Hyvönen: Named Entity Linking in a Complex Domain: Case Second World War History. Proceedings, Language, Data and Knowledge (LDK 2017), pp. 120-133, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper discusses the challenges of applying named entity linking in a rich, complex domain – specifically, the linking of 1) military units, 2) places and 3) people in the context of rich Second World War data. Multiple sub-scenarios are discussed in detail through concrete evaluations, analyzing the problems faced, and the solutions developed. A key contribution of this work is to highlight the heterogeneity of problems and approaches needed even inside a single domain, depending on both the source data as well as the target authority.
Erkki Heino: Sotahistorian kuvaaminen ja rikastaminen linkitettynä datana. MSc Thesis (in Finnish), University of Helsinki, Department of Computer Science, June, 2017. bib pdf link
Linkitetty data mahdollistaa erillisten aineistojen yhdistämisen, mistä syntyvä kokonaisuus mahdollistaa aineistojen tietojen paremman ymmärtämisen. Aineistojen välisten linkkien avulla voidaan päätellä uutta tietoa helpommin kuin tarkastelemalla aineistoja erikseen. Tutkielmassa käsitellään sotahistoriallisten aineistojen mallintamista ja julkaisua linkitettynä avoimena datana sekä aineistojen automaattista rikastamista muiden aineistojen avulla. Työn tavoitteena oli selvittää miten tällaisia aineistoja kannattaa mallintaa linkitettynä datana, miten niitä kannattaa yhdistää muihin aineistoihin, mitä lisäarvoa tästä saadaan ja miten aineistot kannattaa visualisoida. Aineistoina käytettiin tietokirjoista digitoituja tapahtumia sekä Sotamuseon SA-kuvapalvelun valokuvien metatietoja. Aineistot mallinnettiin käyttäen CIDOC CRM -standardia ja niitä rikastettiin linkittämällä niiden sisältämiä resursseja automaattisesti henkilö-, joukko-osasto- ja paikkaontologioiden avulla. CIDOC CRM:n määrittämä tapahtumakeskeinen mallinnustapa mahdollistaa aineistojen yhteentoimivuuden paitsi toistensa myös muiden historiallisten aineistojen kanssa. Automaattiseen rikastamiseen liittyi monia haasteita, sillä viittaukset toisiin aineistoihin oli poimittava suurelta osin tekstimuotoisista kuvauksista, jolloin ongelmaksi nousee nimettyjen entiteettien kuten henkilöiden ja paikkojen tunnistaminen ja yksilöinti tekstistä. Työssä käsitellään kyseisiä haasteita, esitellään käytetyt ratkaisut ja arvioidaan näiden toimivuutta. Aineistoja visualisoimaan toteutettiin myös JavaScript-sovellukset. Aineistot ja sovellukset on julkaistu osana Sotasampo-portaalia, joka muodostaa yhteenlinkitetyn kokonaisuuden erilaisista aineistoista liittyen toiseen maailmansotaan Suomessa. Portaali palvelee paitsi Suomen historiasta ja sodissa taistelleiden omaistensa liikkeistä kiinnostuneita kansalaisia, myös historian tutkijoita tarjoamalla aineistot vapaasti kyseltävässä rakenteisessa muodossa.
Minna Tamper, Petri Leskinen, Esko Ikkala, Arttu Oksanen, Eetu Mäkelä, Erkki Heino, Jouni Tuominen, Mikko Koho and Eero Hyvönen: AATOS – a Configurable Tool for Automatic Annotation. Proceedings, Language, Data and Knowledge (LDK 2017), pp. 276-289, Springer-Verlag, Galway, Ireland, June, 2017. bib pdf link
This paper presents an automatic annotation tool AATOS for providing documents with semantic annotations. The tool links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different natural language Finnish texts. The application was developed as a part of WarSampo and Semantic Finlex projects and tested using Kansa Taisteli magazine articles and consolidated Finnish legislation of Semantic Finlex. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results.

2016

Eero Hyvönen: Cultural Heritage Linked Data on the Semantic Web: Three Case Studies Using the Sampo Model. VIII Encounter of Documentation Centres of Contemporary Art: Open linked data and integral management of information in cultural centres, 2016. Artium, Vitoria-Gasteiz, Spain, October 19-20, 2016. bib pdf
A major challenge in publishing linked Cultural Heritage (CH) collections on the web is interoperability. This is due to the heterogeneity of CH contents and the distributed content creation model where publishers focus on their own data with little consideration on the others’ data. As a solution approach, the “Sampo” model is presented based on using domain independent modeling standards, on a model for aligning metadata models, and on sharing domain ontologies for populating the matadata models. The harmonized data is published for machines as a linked data service, to be used by applications for human users. To illustrate and evaluate the model, three online systems on the Web, Culture- Sampo, BookSampo, and WarSampo are presented.
Petri Leskinen: Sotilashenkilöiden ja joukko-osastojen mallintaminen ja käyttö toimijaontologiana. MSc Thesis (in Finnish), Aalto University, School of Science, Degree Programme in Computer Science and Engineering, Dec, 2016. bib pdf
Toimijaontologia mallintaa henkilöitä ja henkilöryhmiä linkitetyssä avoimessa datassa. Toimijaontologiamallin tarkoitus on mahdollistaa eri lähteiden aineistojen kokoaminen yhteen ja sen julkaisu yhdenmukaisessa formaatissa, jotta tietoa voidaan hyödyntää niin digitaalisten ihmistieteiden tutkimuksessa kuin tarjoamalla käyttöliittymiä aineiston selaamiseen visuaalisessa muodossa. Laadittu ontologia noudattaa toimija–tapahtuma-mallia. Siinä toimija mallinnetaan häneen liittyvien elämäkerrallisten tapahtumien summana. Ratkaisujen perustana käytettiin CIDOC CRM -standardia, millä haluttiin taata mallin helppo laajennettavuus sekä noudattaa kulttuurihistorialliselle datalle yhdenmukaista julkaisukäytäntöä. Työ on tehty osana laajempaa Sotasampo-projektia, johon kerättiin kattava tietokanta toisen maailmansodan aikaista aineistoa Suomen osalta. Oma osuuteni tässä työssä oli toimijaontologiamallin laatiminen sekä sen populointi sotilashenkilöillä ja -osastoilla. Aineisto on julkaistu avoimena datana (http://www.ldf.fi/dataset/warsa) ja on selattavissa Sotasampo-portaalissa (http://www.sotasampo.fi).
Minna Tamper: Extraction of Entities and Concepts from Finnish Texts. MSc Thesis (in English), Aalto University, School of Science, Degree Programme in Computer Science and Engineering, Dec, 2016. bib pdf
Keywords are used in many document databases to improve search. The process of assigning keywords from controlled vocabularies to a document is called subject indexing. If the controlled vocabulary used for indexing is an ontology, with semantic relations and descriptions of concepts, the process is also called semantic annotation. In this thesis an automatic annotation tool was created to provide the documents with semantic annotations. The application links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different Finnish texts. The application was developed as a part of WarSampo and Semantic Finlex projects and tested using Kansa Taisteli magazine articles and consolidated legislation of Finnish legislation. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results.
Esko Ikkala: Suomalainen historiallisten paikkojen ja karttojen ontologiapalvelu. MSc Thesis (in Finnish), Aalto University, School of Electrical Engineering, Degree Programme of Automation and Systems Technology, August, 2016. bib pdf link
Historiallinen paikkatieto on keskeisessä asemassa muistiorganisaatioiden kokoelmien hallinnassa ja hyödyntämisessä sekä digitaalisten ihmistieteiden tutkimuksessa. Paikkatiedon käsitteleminen muissa kuin erikoistuneissa paikkatietojärjestelmissä sekä paikkatiedon ajallinen ulottuvuus tuovat mukanaan lukuisia haasteita, joihin linkitetyn datan teknologiat ovat tarjonneet lupaavia ratkaisuja. Tässä työssä esitellään kulttuurialan organisaatioiden tarpeeseen kehitetty uusi linkitetyn datan teknologioihin perustuva historiallisten paikkojen ja karttojen palvelumalli, HIPLA. HIPLA-palvelumallin tavoitteena on tarjota yhteinen näkymä eri organisaatioiden hallinnoimaan paikkatietoon ja mahdollistaa hajautettujen paikkatietoaineistojen yhteisöllinen täydentäminen, haku ja selailu sekä nykyisillä että historiallisilla kartoilla. Lisäksi työssä toteutettiin HIPLA-palvelumallin etuja havainnollistava prototyyppisovellus Hipla.fi, jota pilotoitiin osana talvi- ja jatkosodan aineistoja linkitettynä avoimena datana julkaisevaa Sotasampo-projektia. Pilotoinnin tuloksena syntyi talvi- ja jatkosodan paikkaontologia, joka tarjoaa työkalun sotiin liittyvien aineistojen automaattiselle linkitykselle ja aineistojen maantieteelliselle visualisoimiselle.
Eero Hyvönen, Erkki Heino, Petri Leskinen, Esko Ikkala, Mikko Koho, Minna Tamper, Jouni Tuominen and Eetu Mäkelä: Publishing Second World War History as Linked Data Events on the Semantic Web. Proceedings of Digital Humanities 2016, short papers, pp. 571-573, Kraków, Poland, July, 2016. bib pdf link
Data about wars is typically heterogeneous, distributed in the data silos of the fighting parties, multilingual, and often controversial depending on the political point of view. It is therefore hard for the historians to get a global picture of what has actually happened, to whom, where, when, and how. We argue that Semantic Web and Linked Data technologies are a very promising approach for modeling, harmonizing, and aggregating data about war history. Our goal is to make it possible, for both historians and laymen, to study history in a contextualized way where linked datasets enrich each other. The paper presents the in-use WarSampo 1 system, where massive collections of heterogeneous data about the (Finnish) history of the Second World War are harmonized using an event-based approach, and provided as a Linked Open Data service for applications to use. As a use case, a semantic portal WarSampo providing six different perspectives to the war based on events is presented.
Eero Hyvönen, Erkki Heino, Petri Leskinen, Esko Ikkala, Mikko Koho, Minna Tamper, Jouni Tuominen and Eetu Mäkelä: WarSampo Data Service and Semantic Portal for Publishing Linked Open Data about the Second World War History. The Semantic Web – Latest Advances and New Domains (ESWC 2016) (Harald Sack, Eva Blomqvist, Mathieu d Aquin, Chiara Ghidini, Simone Paolo Ponzetto and Christoph Lange (eds.)), pp. 758-773, Springer-Verlag, May, 2016. bib pdf link
This paper presents the WarSampo system for publishing collections of heterogeneous, distributed data about the Second World War on the Semantic Web. WarSampo is based on harmonizing massive datasets using event-based modeling, which makes it possible to enrich datasets semantically with each others’ contents. WarSampo has two components: First, a Linked Open Data (LOD) service WarSampo Data for Digital Humanities (DH) research and for creating applications related to war history. Second, a semanticWarSampo Portal has been created to test and demonstrate the usability of the data service. The WarSampo Portal allows both historians and laymen to study war history and destinies of their family members in the war from different interlinked perspectives. Published in November 2015, theWarSampo Portal had some 20,000 distinct visitors during the first three days, showing that the public has a great interest in this kind of applications.
Mikko Koho, Eero Hyvönen, Erkki Heino, Jouni Tuominen, Petri Leskinen and Eetu Mäkelä: Linked Death - Representing, Publishing, and Using Second World War Death Records as Linked Open Data. Proceedings of the 1st Workshop on Humanities in the Semantic Web (WHiSe), CEUR Workshop Proceedings, Heraklion, Crete, Greece, May, 2016. Vol 1608. bib pdf link
War history of the Second World War (WW2), humankind s largest disaster, is of great interest to both laymen and researchers. Most of us have ancestors and relatives who participated in the war, and in the worst case got killed. Researchers are eager to find out what actually happened then, and even more importantly why, so that future wars could perhaps be prevented. The darkest data of war history are casualty records---from such data we could perhaps learn most about the war. This paper presents a model and system for representing death records as linked data, so that 1) citizens could find out more easily what happened to their relatives during WW2 and 2) digital humanities (DH) researchers could (re)use the data easily for research.

2015

Eero Hyvönen, Jouni Tuominen, Eetu Mäkelä, Jérémie Dutruit, Kasper Apajalahti, Erkki Heino, Petri Leskinen and Esko Ikkala: Second World War on the Semantic Web: The WarSampo Project and Semantic Portal. Proceedings of the ISWC 2015 Posters & Demonstrations Track, CEUR-WS Proceedings, Bethlehem, PA, USA, October, 2015. Vol 1486. bib pdf link
This paper initiates and fosters work on publishing Linked Open Data about the Second World War. It is argued that the heterogeneous, distributed data about the international world war history makes a promising use case for semantic technologies. We hope that by making war data openly available we can learn from the past and promote peace.
/var/www/html/include/secoweb/utils.php; Sat, 23 Nov 2024 21:12:04 +0000