FIN-CLARIAH research infrastructure - Semantic Computing Research Group (SeCo)

What is FIN-CLARIAH?

FIN-CLARIAH (2022-) is the premier Finnish digital research infrastructure for Social Sciences and Humanities (SSH) comprising two components,

FIN-CLARIN (Finnish dimension of the Pan-European CLARIN infrastructure) and
DARIAH-FI (Finnish collaborations with the Pan-European DARIAH infrastructure).

In their first common development project, the FIN-CLARIAH components seek to significantly broaden their mutual scope of digital SSH infrastructural support by consolidating and enhancing their resources with three major goals:

Reach beyond processing of spoken standard Finnish into colloquial speech
Cater to a broad range of SSH research needs for processing unstructured text
Facilitate research based on metadata

The SSH field have not been at the forefront of the use of digital technology historically. However, this field in Finland has potential to enact such a transformation. The aim of FIN-CLARIAH is to ensure that such a digital transformation happens in an orderly fashion without duplication of efforts or reinventing the wheel.

FIN-CLARIAH involves all Finnish universities with research in SSH, including the coordinator University of Helsinki (Faculty of Humanities, Faculty of Social Sciences, and National Library), CSC – IT Center for Science Ltd., Aalto University, Tampere Universities, Universities of Eastern Finland, as well as the universities of Jyväskylä and Turku. In addition, FIN-CLARIAH has as project collaborators the universities of Vaasa and Oulu, the Institute for the Languages of Finland, and the National Archives of Finland.

Our Mission: Finnish Linked Open Data Infrastructure for Digital Humanities (LODI4DH)

The Aalto work in FIN-CLARIAH is related to maintaining and developing further the Linked Open Data Infrastructure for Digital Humanities in Finland (LODI4DH) in collaboration with the University of Helsinki (HELDIG, Faculty of Humanities) and other partners within the DARIAH-FI part of FIN-CLARIAH. The work includes also work on language infrastructures for spinning the Semantic Web and collaborations with FIN-CLARIN and CLARIN-EU. Our work is part of the cooperative partnership agreement between Aalto and DARIAH-EU.

The vision and results of our work are by 2024 are summarized in the presentation below, given at the DCMI 2024 conference, Toronto, Canada:

Figure 1. Elements of national semantic web infrastructure

Figure 1 depicts elements that are needed in developing a national Semantic Web infrastructure according to the experiences reported in this paper. The system is based on domain agnostic W3C Web Standards and Best Practices (on the left below in the figure) of publishing Linked Data. Data Models are needed for representing metadata and knowledge of different applications domains, populated by resources taken from shared domain Ontologies and Ontology Services for interoperability. The ontologies should be made openly available and easy to access for interoperability and re-use, based on shared ontology services/libraries. In the same vein, data services for publishing LD datasets, preferably using, e.g., open Creative Commons licenses, are needed for making re-use of data possible and easy. Also Applications of Linked Data are part of the infrastructure connecting the system to its end users. For making all this possible, Software Tools are needed for aggregating the distributed heterogeneous data from legacy and other data silos involved, and for extracting and linking (disambiguating) entities and relations from data records and textual descriptions. Also tools for data publishing and analysis are needed, as well as tooling for developing new applications for the end users.

Since 2001, the SeCo group has been working on publishing and using linked data of Cultural Heritage on the Semantic Web and in Digital Humanities. In FIN-CLARIAH our goal is to make selected results of this work available to external users along the following pipeline and compoments outlined below. The work starts step-by-step from more mature software tools and services that have already been used in our earlier research projects.

We hope that most mature parts of the infrastructure, linked data, and applications, now maintained by the Semantic Computing Research Group (SeCo) at the Aalto University and University of Helsinki, will be gradually deployed by the data owners and users in the Finnish Cultural Heritage sector, such the National Archives, Finnish Literature Society, Finnish Heritage Agency, Finnish Institute for Languages, National Library, Ministry of Justice, and Parliament of Finland. Data from these organizations and others have been enriched, linked, and published at the Linked Data Finland platform, and used in the Sampo portals in use in Finland. Finding sustainable solution for maintaining the services and the underlying infrastructure through work in FIN-CLARIAH would be desirable.

Implementation: Supporting Infrastructure Pipeline and Components

Our work in FIN-CLARIAH falls into several areas that need to be covered in order to create data and services for DH research:

Speech2Text. Tooling for creating textual time-stamped representations of videos and audio recordings. Here the goal is, e.g., to facilitate preservation of intangible cultural heritage and easy access to it, as in the WarMemoirSampo system that publishes interview memoirs of the veterans of the WW2.
Image2Text. OCR services developed, e.g., for the historical minutes of the Parliament of Finland in the ParliamentSampo system.
Text2Knowledge. Finnish language toolkit & web services for linked data knowledge extraction from unstructured Finnish texts, including named entity recognition and linking, automatic keyword annotation, relation extraction, and semantic labeling. This work has been carried out, e.g., in our various systems related to biographical texts, such as BiographySampo and AcademySampo.
Knowledge2DataAnalysis. Reusable tooling for Digital Humanities on top of a linked data service and SPARQL endpoint, as used in various Sampo systems.
DataAnalysis2AI. Tooling for knowledge discovery and computational creativity. Here the machine is seen as an intelligent agent searching itself for interesting patterns in knowledge graphs, solving problems, and even explaining the results to the human user (to support "3. generation DH systems" as suggested in this paper).

Infrastructure components to be maintained and built in our part of the FIN-CLARIAH initiative include:

ONKI ontology services (ONKI.fi) for history, extending the Finto.fi services of the National Library. This work comprises ontologies for historical persons, places, events, times, occupations, and names.
Historical map services (Hipla.fi). Here historical maps can be aligned with contemporary ones and used as layers in applications, based on the MapWarper tool and linked data for storing related metadata.
Linked Data Finland (LDF.fi). This platform is used for publihing linked data as services using the standards and best practices of W3C. Our focus here is on using the “7-star” model, extending the classic Tim Berners-Lee's 5-star model, for better reusability and quality of linked datasets.
Natural language processing toolkit and services for extracting linked data.
Learning materials Providing the DH community with more educational online material on using linked data, such as developing the Linked Data School Linda .
Maintaining the Sampo Series of linked open data services and semantic portals in use in Finland and the Sampo-UI framework for developing Sampo applications. In particular, the following Sampos are initially in focus:
- NameSampo (main data owners: Finnish Institute of Languages, National Survey)
- BiographySampo (main data owners: Finnish Literature Society (SKS), Edita Publishing, and others)
- WarSampo, WarVictimSampo 1914–1922, and WarMemoirSampo (main data owners: National Archives, Defence Forces, Tammenlehvän Perinneliitto ry)
- AcademySampo (main data owners: University of Helsinki Archives, National Archives)
- FindSampo (main data owners: Finnish Heritage Agency, National Museum, British Museum (UK))
- Mapping Manuscript Migrations Sampo (main data owners: Oxford University (UK), Schoenberg Institute (US), IRHT (Paris))
- LetterSampo (main data owners: Huygens Insititute (NL), Berlin-Brandenburg Academy of Sciences (D), Oxford University (UK))
- LawSampo (main data owners: Ministry of Justice, Edita Publishing)
- ParliamentSampo (main data owners: Parliament of Finland, Finnísh Literature Society)
- LetterSampo Finland (main data owners: Various Finnish archives for epistolary data (letters), including National Archives, National Library, National Gallery, Åbo Academy, Finnish Literature Society, Svenska Litteratursällskapet i Finland, and many others)
- OperaSampo (main data owner: Sibelius Academy)

More Information about the Infrastucture

The following short persentation at the DARIAH Annual Meeting 2023 in Budapest gives an overview of our work related to FIN-CLARIAH:

Finnish LOD Infra and Sampo portals, DARIAH Annual Meeting, Budapest, 2023 from SeCo Research Group on Vimeo.

The keynote presentation video of the DCMI 2021 conference below, the related paper How to Create a National Cross-domain Ontology and Linked Data Infrastructure and Use It on the Semantic Web , Digtal Humanities on the Semantic Web: Sampo model and portal series , and other papers listed below overview our work on developing a national Semantic Web infrastructure in Finland and its applications. For a full account of SeCo research on this topic see the SeCo Publications List.

Making National Linked Open Data Services Sustainable

Here is a video (in Finnish) suggesting one way to make the Linked Open Data services and Sampo portals sustainable. Would establishing a joint collaborative Linked Open Data Centre run by the memory organizations be a ferasible solution?

Ehdotus Sampo-portaalien ja -datapalveluiden vakinaistamiseksi from SeCo Research Group on Vimeo.

The FIN-CLARIAH work is funded by the Research Council of Finland under the NextGeneration funding programme of the European Union, as part of the national research infrastructure programme FIRI 2021. The first phase of the initiative lasted 2022-2023 and the second 2024-2025.

Contact

Professor Eero Hyvönen
Aalto University and University of Helsinki (Helsinki Centre for Digital Humanities HELDIG)

Publications

2026

Annastiina Ahola, Lilli Peura, Rafael Leal, Heikki Rantala and Eero Hyvönen: Using generative AI and LLMs to enrich art collection metadata for searching, browsing, and studying art history in Digital Humanities. Humanizing Technology, Volume III - Artificial Intelligence and the Humanities (Silvia Lima, Gonçalves Araújo, Micaela Aguiar, Dalila Durães (ed.)), Peter Lang Verlag, March, 2026. In press. bib pdf

Eero Hyvönen, Petri Leskinen, Heikki Rantala and Jouni Tuominen: Using ParliamentSampo Linked Open Data Service and Portal for Analyzing Interruptions and Laughter in the Plenary Sessions of the Parliament of Finland. Posters, Demos, Blue Sky, and Tutorials at SEMANTiCS 2026, Sep 2026, Ghent, Belgium, CEUR Workshop Proceedings, 2026. Forth-coming. bib pdf

2025

Eero Hyvönen, Laura Sinikallio, Petri Leskinen, Senka Drobac, Rafael Leal, Matti La Mela, Jouni Tuominen, Henna Poikkimäki and Heikki Rantala: Publishing and Using Parliamentary Linked Data on the Semantic Web: ParliamentSampo System for Parliament of Finland. Semantic Web, vol. 16, no. 1, 2025. DOI: 10.3233/SW-243683. bib pdf link

Ilona Pikkanen, Matti La Mela, Hanna-Leena Paloposki and Jouni Tuominen: A Critical Collection History of Nineteenth-century Women’s Letters: Overcoming the Occluded Archive with Data-Driven Methods. Digital Humanities Quarterly, vol. 19, no. 4, 2025. bib pdf link

Eero Hyvönen: Serendipitous knowledge discovery on the Web of Wisdom based on searching and explaining interesting relations in knowledge graphs. Journal of Web Semantics, vol. 85, Elsevier, May, 2025. DOI: 10.1016/j.websem.2024.100852. bib pdf link

2024

Eero Hyvönen: How to Create and Use a National Cross-domain Ontology and Data Infrastructure on the Semantic Web. Semantic Web - Interoperability, Usability, Applicability, vol. 15, no. 4, pp. 1499-1513, 2024. DOI: 10.3233/SW-243468. bib pdf link

Eero Hyvönen and Jouni Tuominen: 8-star Linked Open Data Model: Extending the 5-star Model for Better Reuse, Quality, and Trust of Data. Posters, Demos, Workshops, and Tutorials of the 20th International Conference on Semantic Systems (SEMANTiCS 2024), vol. 3759, CEUR Workshop Proceedings, September, 2024. bib pdf link

Eero Hyvönen: Sampo-järjestelmien verkosto avaa linkitettyä kulttuuridataa tutkijoille ja kansalaisille semanttisessa webissä. Tieteessä tapahtuu, no. 2, 2024. bib pdf

2023

Eero Hyvönen: Creating and Using a Linked Open Ontology and Data Infrastructure for Digital Humanities in Finland: Lessons Learned 2003-2023. Paper presented at the DARIAH-EU Annual Event 2023, Budapest, June, 2023. bib pdf

Minna Tamper, Laura Sinikallio, Jouni Tuominen and Eero Hyvönen: Transforming Linguistically Annotated Finnish Parliamentary Debates Into the Parla-CLARIN Format. Digital Humanities in the Nordic and Baltic Countries Seventh Conference (DHNB 2023), Book of Abstracts (Sofie Gilbert and Annika Rockenberger (eds.)), pp. 118, University of Oslo Library, Oslo, Norway, March, 2023. bib link

Eero Hyvönen: How to Create a National Cross-domain Ontology and Linked Data Infrastructure and Use It on the Semantic Web. Programming and Data Infrastructure in Digital Humanities, Book of Abstracts, pp. 7, High Performance Computing Centre, University of Évora, Portugal, March, 2023. bib link

Eero Hyvönen: Digital Humanities on the Semantic Web: Sampo Model and Portal Series. Semantic Web – Interoperability, Usability, Applicability, vol. 14, no. 4, pp. 729-744, IOS Press, 2023. bib pdf link

2020

Eero Hyvönen: Using the Semantic Web in Digital Humanities: Shift from Data Publishing to Data-analysis and Serendipitous Knowledge Discovery. Semantic Web, vol. 11, no. 1, pp. 187-193, 2020. bib pdf link