» print this page!
» Follow us on Twitter
» Be our friend on Facebook

Latest News

Latest Publications

SeCo on Twitter

SeCo on Facebook

FIN-CLARIAH Research Infrastructure:
Developing a National Linked Open Data Infrastructure

What is FIN-CLARIAH?

FIN-CLARIAH (2022-) is the premier Finnish digital research infrastructure for Social Sciences and Humanities (SSH) comprising two components,

  1. FIN-CLARIN (Finnish dimension of the Pan-European CLARIN infrastructure) and
  2. DARIAH-FI (Finnish collaborations with the Pan-European DARIAH infrastructure).
In their first common development project, the FIN-CLARIAH components seek to significantly broaden their mutual scope of digital SSH infrastructural support by consolidating and enhancing their resources with three major goals:
  1. Reach beyond processing of spoken standard Finnish into colloquial speech
  2. Cater to a broad range of SSH research needs for processing unstructured text
  3. Facilitate research based on metadata

The SSH field have not been at the forefront of the use of digital technology historically. However, this field in Finland has potential to enact such a transformation. The aim of FIN-CLARIAH is to ensure that such a digital transformation happens in an orderly fashion without duplication of efforts or reinventing the wheel.

FIN-CLARIAH involves all Finnish universities with research in SSH, including the coordinator University of Helsinki (Faculty of Arts, Faculty of Social Sciences, and National Library), CSC – IT Center for Science Ltd., Aalto University, Tampere Universities, Universities of Eastern Finland, as well as the universities of Jyväskylä and Turku. In addition, FIN-CLARIAH has as project collaborators the universities of Vaasa and Oulu, the Institute for the Languages of Finland, and the National Archives of Finland.

Our Mission: Finnish Linked Open Data Infrastructure for Digital Humanities (LODI4DH)

The Aalto work in FIN-CLARIAH is related to maintaining and developing further the Linked Open Data Infrastructure for Digital Humanities in Finland (LODI4DH) in collaboration with the University of Helsinki (HELDIG, Faculty of Arts) and other partners within the DARIAH-FI part of FIN-CLARIAH. The work includes also work on language infrastructures for spinning the Semantic Web and collaborations with FIN-CLARIN and CLARIN-EU. Our work is part of the cooperative partnership agreement between Aalto and DARIAH-EU.

Figure 1. Elements of national semantic web infrastructure

Figure 1 depicts elements that are needed in developing a national Semantic Web infrastructure according to the experiences reported in this paper. The system is based on domain agnostic W3C Web Standards and Best Practices (on the left below in the figure) of publishing Linked Data. Data Models are needed for representing metadata and knowledge of different applications domains, populated by resources taken from shared domain Ontologies and Ontology Services for interoperability. The ontologies should be made openly available and easy to access for interoperability and re-use, based on shared ontology services/libraries. In the same vein, data services for publishing LD datasets, preferably using, e.g., open Creative Commons licenses, are needed for making re-use of data possible and easy. Also Applications of Linked Data are part of the infrastructure connecting the system to its end users. For making all this possible, Software Tools are needed for aggregating the distributed heterogeneous data from legacy and other data silos involved, and for extracting and linking (disambiguating) entities and relations from data records and textual descriptions. Also tools for data publishing and analysis are needed, as well as tooling for developing new applications for the end users.

Since 2001, the SeCo group has been working on publishing and using linked data of Cultural Heritage on the Semantic Web and in Digital Humanities. In FIN-CLARIAH our goal is to make selected results of this work available to external users along the following pipeline and compoments outlined below. The work starts step-by-step from more mature software tools and services that have already been used in our earlier research projects.

We hope that most mature parts of the infrastructure, linked data, and applications, now maintained by the Semantic Computing Research Group (SeCo) at the Aalto University and University of Helsinki, will be gradually deployed by the data owners and users in the Finnish Cultural Heritage sector, such the National Archives, Finnish Literature Society, Finnish Heritage Agency, Finnish Institute for Languages, National Library, Ministry of Justice, and Parliament of Finland. Data from these organizations and others have been enriched, linked, and published at the Linked Data Finland platform, and used in the Sampo portals in use in Finland. Finding sustainable solution for maintaining the services and the underlying infrastructure through work in FIN-CLARIAH would be desirable.

Implementation: Supporting Infrastructure Pipeline and Components

Our work in FIN-CLARIAH falls into several areas that need to be covered in order to create data and services for DH research:

  • Speech2Text. Tooling for creating textual time-stamped representations of videos and audio recordings. Here the goal is, e.g., to facilitate preservation of intangible cultural heritage and easy access to it, as in the WarMemoirSampo system that publishes interview memoirs of the veterans of the WW2.
  • Image2Text. OCR services developed, e.g., for the historical minutes of the Parliament of Finland in the ParliamentSampo system.
  • Text2Knowledge. Finnish language toolkit & web services for linked data knowledge extraction from unstructured Finnish texts, including named entity recognition and linking, automatic keyword annotation, relation extraction, and semantic labeling. This work has been carried out, e.g., in our various systems related to biographical texts, such as BiographySampo and AcademySampo.
  • Knowledge2DataAnalysis. Reusable tooling for Digital Humanities on top of a linked data service and SPARQL endpoint, as used in various Sampo systems.
  • DataAnalysis2AI. Tooling for knowledge discovery and computational creativity. Here the machine is seen as an intelligent agent searching itself for interesting patterns in knowledge graphs, solving problems, and even explaining the results to the human user (to support "3. generation DH systems" as suggested in this paper).

Infrastructure components to be maintained and built in our part of the FIN-CLARIAH initiative include:

  • ONKI ontology services (ONKI.fi) for history, extending the Finto.fi services of the National Library. This work comprises ontologies for historical persons, places, events, times, occupations, and names.
  • Historical map services (Hipla.fi). Here historical maps can be aligned with contemporary ones and used as layers in applications, based on the MapWarper tool and linked data for storing related metadata.
  • Linked Data Finland (LDF.fi). This platform is used for publihing linked data as services using the standards and best practices of W3C. Our focus here is on using the “7-star” model, extending the classic Tim Berners-Lee's 5-star model, for better reusability and quality of linked datasets.
  • Natural language processing toolkit and services for extracting linked data.
  • Learning materials Providing the DH community with more educational online material on using linked data, such as developing the Linked Data School Linda .
  • Maintaining the Sampo Series of linked open data services and semantic portals in use in Finland and the Sampo-UI framework for developing Sampo applications. In particular, the following Sampos are initially in focus:
    • NameSampo (main data owners: Finnish Institute of Languages, National Survey)
    • BiographySampo (main data owners: Finnish Literature Society (SKS), Edita Publishing, and others)
    • WarSampo, WarVictimSampo 1914–1922, and WarMemoirSampo (main data owners: National Archives, Defence Forces, Tammenlehvän Perinneliitto ry)
    • AcademySampo (main data owners: University of Helsinki Archives, National Archives)
    • FindSampo (main data owners: Finnish Heritage Agency, National Museum, British Museum (UK))
    • Mapping Manuscript Migrations Sampo (main data owners: Oxford University (UK), Schoenberg Institute (US), IRHT (Paris))
    • LetterSampo (main data owners: Huygens Insititute (NL), Berlin-Brandenburg Academy of Sciences (D), Oxford University (UK))
    • LawSampo (main data owners: Ministry of Justice, Edita Publishing)
    • ParliamentSampo (main data owners: Parliament of Finland, Finnísh Literature Society)
    • CoCoSampo (main data owners: Various Finnish archives for epistolary data (letters), including National Archives, National Library, National Gallery, Åbo Academy, Finnish Literature Society, Svenska Litteratursällskapet i Finland, and many others)
    • OperaSampo (main data owner: Sibelius Academy)

More Information about the Infrastucture

The following short persentation at the DARIAH Annual Meeting 2023 in Budapest gives an overview of our work related to FIN-CLARIAH:

Finnish LOD Infra and Sampo portals, DARIAH Annual Meeting, Budapest, 2023 from SeCo Research Group on Vimeo.

The keynote presentation video of the DCMI 2021 conference below, the related paper How to Create a National Cross-domain Ontology and Linked Data Infrastructure and Use It on the Semantic Web , Digtal Humanities on the Semantic Web: Sampo model and portal series , and other papers listed below overview our work on developing a national Semantic Web infrastructure in Finland and its applications. For a full account of SeCo research on this topic see the SeCo Publications List.

Making National Linked Open Data Services Sustainable

Here is a video (in Finnish) suggesting one way to make the Linked Open Data services and Sampo portals sustainable. Would establishing a joint collaborative Linked Open Data Centre run by the memory organizations be a ferasible solution?

Ehdotus Sampo-portaalien ja -datapalveluiden vakinaistamiseksi from SeCo Research Group on Vimeo.

The FIN-CLARIAH work is funded by the Research Council of Finland under the NextGeneration funding programme of the European Union, as part of the national research infrastructure programme FIRI 2021. The first phase of the initiative lasted 2022-2023 and the second 2024-2025.

Contact

Professor Eero Hyvönen
Aalto University and University of Helsinki (Helsinki Centre for Digital Humanities HELDIG)


Publications

2024

Annastiina Ahola, Lilli Peura, Rafael Leal, Heikki Rantala and Eero Hyvönen: Using generative AI and LLMs to enrich art collection metadata for searching, browsing, and studying art history in Digital Humanities. Proceedings, 2nd International Conference on Data & Digital Humanities Generative Artificial Intelligence for Text and Multimodal Data 12th - 13th December 2024, University of Minho, Braga, Portugal, November, 2024. Accepted, forth-coming. bib pdf
Eero Hyvönen and Jouni Tuominen: 8-star Linked Open Data Model: Extending the 5-star Model for Better Reuse, Quality, and Trust of Data. Posters, Demos, Workshops, and Tutorials of the 20th International Conference on Semantic Systems (SEMANTiCS 2024), vol. 3759, CEUR Workshop Proceedings, September, 2024. bib pdf link

2023

Eero Hyvönen: Creating and Using a National Linked Open Data Infrastructure for Cultural Heritage Applications and Digital Humanities Research: Lessons Learned. DARIAH Annual Event 2023, Budapest, Hungary, abstracts of papers, DARIAH-EU, June, 2023. bib link
Eero Hyvönen: How to Create a National Cross-domain Ontology and Linked Data Infrastructure and Use It on the Semantic Web. Programming and Data Infrastructure in Digital Humanities, Book of Abstracts, pp. 7, High Performance Computing Centre, University of Évora, Portugal, March, 2023. bib link
Minna Tamper, Laura Sinikallio, Jouni Tuominen and Eero Hyvönen: Transforming Linguistically Annotated Finnish Parliamentary Debates Into the Parla-CLARIN Format. Digital Humanities in the Nordic and Baltic Countries Seventh Conference (DHNB 2023), Book of Abstracts (Sofie Gilbert and Annika Rockenberger (eds.)), pp. 118, University of Oslo Library, Oslo, Norway, March, 2023. bib link

2020

2014

Eero Hyvönen, Jouni Tuominen, Miika Alonen and Eetu Mäkelä: Linked Data Finland: A 7-star Model and Platform for Publishing and Re-using Linked Datasets. The Semantic Web: ESWC 2014 Satellite Events. ESWC 2014 (Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I. and Tordai, A. (eds.)), pp. 226-230, Springer-Verlag, May, 2014. bib pdf link
The idea of Linked Data is to aggregate, harmonize, integrate, enrich, and publish data for re-use on the Web in a cost-efficient way using Semantic Web technologies. We concern two major hindrances for re-using Linked Data: It is often difficult for a re-user to 1) understand the characteristics of the dataset and 2) evaluate the quality the data for the intended purpose. This paper introduces the “Linked Data Finland” platform LDF.fi addressing these issues. We extend the famous 5-star model of Tim Berners-Lee, with the sixth star for providing the dataset with a schema that explains the dataset, and the seventh star for validating the data against the schema. LDF.fi also automates data publishing and provides data curation tools. The first prototype of the platform is available on the web as a service, hosting tens of datasets and supporting several applications.

2009

Jouni Tuominen, Matias Frosterus, Kim Viljanen and Eero Hyvönen: ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies and Ontologies as Services. Proceedings of the 6th European Semantic Web Conference (ESWC 2009), pp. 768-780, Springer-Verlag, Heraklion, Greece, May 31 - June 4, 2009. bib pdf
Vocabularies are the building blocks of the Semantic Web providing shared terminological resources for content indexing, information retrieval, data exchange, and content integration. Most semantic web applications in practical use are based on lightweight ontologies and, more recently, on the Simple Knowledge Organization System (SKOS) data model being standardized by W3C. Easy and cost-efficient publication, integration, and utilization methods of vocabulary services are therefore highly important for the proliferation of the Semantic Web. This paper presents the ONKI SKOS Server for these tasks. Using ONKI SKOS, a SKOS vocabulary or a lightweight ontology can be published on the web as ready-to-use services in a matter of minutes. The services include not only a browser for human usage, but also Web Service and AJAX interfaces for concept finding, selecting and transporting resources from the ONKI SKOS Server to connected systems. Code generation services for AJAX and Web Service APIs are provided automatically, too. ONKI SKOS services are also used for semantic query expansion in information retrieval tasks. The idea of publishing ontologies as services is analogous to Google Maps. In our case, however, vocabulary services are provided and mashed-up in applications. ONKI SKOS was published in the beginning of 2008 and is to our knowledge the first generic SKOS server of its kind. The system has been used to publish and utilize some 60 vocabularies and ontologies in the National Finnish Ontology Service ONKI www.yso.fi.
Kim Viljanen, Jouni Tuominen and Eero Hyvönen: Ontology Libraries for Production Use: The Finnish Ontology Library Service ONKI. Proceedings of the 6th European Semantic Web Conference (ESWC 2009), pp. 781-795, Springer-Verlag, Heraklion, Greece, May 31 - June 4, 2009. bib pdf
This paper discusses problems of creating and using ontology library services in production use. One approach to a solution is presented with an online implementation--the Finnish Ontology Library Service ONKI--that is in pilot use on a national level in Finland. ONKI contributes to previous research on ontology libraries in many ways: First, mashup and web service support with various tools is provided for cost-efficient utilization of ontologies in indexing and search applications. Second, services covering the different phases of the ontology life cycle are provided. Third, the services are provided and used in real world applications on a national scale. Fourth, the ontology framework is being developed by a collaborative effort by organizations representing different application domains, such as health, culture, and business.

2008

Eero Hyvönen, Kim Viljanen, Jouni Tuominen and Katri Seppälä: Building a National Semantic Web Ontology and Ontology Service Infrastructure - The FinnONTO Approach. Proceedings of the European Semantic Web Conference ESWC 2008, pp. 95-109, Springer, Tenerife, Spain, June, 2008. bib pdf
This article presents the vision and results of creating a national level cross-domain ontology service infrastructure in Finland in the FinnONTO project. The novelty of the infrastructure is based on two ideas. First, a system of open source core ontologies is being developed by transforming thesauri into mutually aligned lightweight ontologies, including a top ontology of 20,000 concepts that is extended by various domain specific ontologies. Second, the ONKI Ontology Server framework for publishing ontologies as ready to use services has been designed and implemented. ONKI provides legacy and other applications with ready to use functionalities for using ontologies on the user interface level as semantic widgets. The idea is to use ONKI for creating mash-up applications in a way analogous to using Google or Yahoo Maps, but in our case external applications are mashed-up with ontology support. The ontology framework presented is operational on the web and is being used in creating the application demonstrations.
/var/www/html/include/secoweb/utils.php; Fri, 06 Dec 2024 19:55:22 +0000