Linked Data Finland (2012-2014)
Abstract in English
Linked Data Finland is a national research project aiming at developing new technology for harvesting, publishing, and utilizing open data using the Linked Data approach and semantic web technologies. The project is supported and funded by a large multi-disciplinary consortium of 20 organizations, companies and public sector institutions, including e.g. three Finnish ministeries.
The project builds upon the legacy, results (ontologies, ontology services, tools, pilot applications), and
established collaboration network of the FinnONTO project series 2003-2012.
The main results of the LDF project include:
- Linked Data Finland data portal LDF.fi. The idea here is to extend the "5-star" Linked Data
model into a "7-star" model for enhanced re-usability of linked datasets. The portal contains SPARQL endpoints and additional tools and services with tens of datasets. See LDF.fi for more details.
- The datasets are studied and piloted in various application areas, including Cultural Heritage, Law, Libraries, News, Ornithology, and History. See, e.g., the case study Finnish Law as a Linked Data Service. Links to applications and data visualizations can be found at the dataset home pages of LDF.fi.
- Related publications can be found at the end of this page.
The project results have been explained Jan 24th, 2014 in the "Linked Open Data Finland" seminar in 21 presentations.
[In Finnish]
Linked Data Finland -hankkeen tavoitteet, organisaatio ja kuvaus
World Wide Webin uusimpia megatrendejä on Linked (Open) Data, Web of Data, jossa verkon sisältöjä yhdistävät ja hyödyntävät älykkäät verkkopalvelut laajojen, avointen semanttisten tietovarantojen avulla. Niiden ytimessä ovat mm. Wikipediasta louhittu DBPedia, Googlen ensyklopedinen Freebase ja yli kaksisataa muuta näihin ja toisiinsa yhdistettyä tietoaineistoa ns. Linked Open Data -pilvessä. Aalto- ja Helsingin yliopiston johtaman kansallisen Linked Data Finland ‑hankkeen(2012-2014) tavoitteena on kehittää ja edistää webin yhdistettyjen avointen tietovarantojen hyödyntämistä maassamme uusimpien semanttisten web-teknologioiden avulla. Hanke tarjoaa teknistieteellisen selkänojan maassamme syntyneelle tahtotilalle julkisten tietovarantojen avaamiseksi ja hyödyntämiseksi julkisissa palveluissa ja yritystoiminnassa, mikä tavoite on mukana mm. valtion uudessa hallitusohjelmassa. Budjetiltaan n. miljoonan euron hanketta tukee sen alkuvaiheessa Tekesin ja 22 yrityksen ja julkisen
organisaation monialainen konsortio, jossa on mukana mm. kolme eri ministeriötä.
Projektin yksi keskeinen tulos on "7-tähden" Linked Data Finland -portaali LDF.fi,
jossa on julkaistu kymmeniä yhdistetyn datan joukkoja lukuisilta sovellusalueilta, kuten kultuuuri, laki. kirjastot, media,
tutkimuksen tietoaineistot ja historia. Näihin kannattaa tutustua suoraan portaalin datajoukkojen kotisivujen kautta, jossa on myös linkkejä erilaisiin työkaluihin, palveluihin, datan visualisointeihin ja sovelluksiin.
Lisätietoja:
Valtioneuvoston kanslian teettämä "menestystarinavideo" Linked Data Finland
Linked Data Finland -esite (PDF) valtioneuvoston kanslian tapahtumassa
Avoin Suomi 2014.
Tarkempi hankekuvaus (PDF)
Linked Data Finland -portaali LDF.fi
Lehdistötiedote projektin käynnistymisestä (PDF)
Projektin tuloksia esitellyt seminaari "Linked Open Data Finland".
Contact Person
Prof. Eero Hyvönen, Research Director
Publications
2021
Eero Hyvönen:
How to Create a National Cross-domain Ontology and Linked Data Infrastructure and Use It on the Semantic Web. Oct, 2021. Keynote presentation for the DCMI 2021 conference.
bib pdf The vision behind the Semantic Wed is to build a global Web of Data (Giant Global Graph, GGG) for machines to use: based on this an interoperable and intelligent transnational WWW for humans can be created cost-efficiently. This keynote presentation for the DCMI 2001 conference addresses this grand challenge on a national level, as in practice much of the data available are often related to each other within national cultures, borders, organizations, and are represented using national languages, metadata models, vocabularies, and local conventions. This presentation overviews and discusses the vision and lessons learned in Finland on developing and deploying a cross-domain national ontology service infrastructure and Linked Open Data (LOD) publishing framework, extending the classic 5-star model to a 7-star model for better data re-usability (6. star) and quality (7. star). To test and demonstrate the infrastructure, a series of semantic portals and LOD services have been created using the Sampo model that has evolved gradually in 2002--2021 through lessons learned when developing and publishing the Sampo series of systems, including MuseumFinland (2004), HealthFinland (2009), CultureSampo (2009), BookSampo (2011), WarSampo (2015), BiographySampo (2018), NameSampo (2019), WarWictimSampo (2019), Mapping Manuscript Migrations (2020), AcademySampo (2021), as well as FindSampo, Law\-Sampo, and ParliamentSampo underway. These systems cover a wide range of application domains and have attracted up to millions of users on the Semantic Web depending on the application, suggesting feasibility of the proposed model. This work shows a shift of focus in research on semantic portals from data aggregation and exploration systems (1. generation systems) to systems supporting research with data analytic tools (2. generation systems), and finally to automatic knowledge discovery and Artificial Intelligence (3. generation systems).
2018
Arttu Oksanen, Jouni Tuominen, Eetu Mäkelä, Minna Tamper, Aki Hietanen, and Eero Hyvönen:
Semantic Finlex: Finnish Legislation and Case Law as a Linked Open Data Service.
Proceedings of Law via the Internet 2018 (LVI 2018), Knowledge of the Law in the Big Data Age, abstracts, Florence, Italy, October, 2018.
bib pdf
2017
2015
Mikko Koho:
Linked Data -palvelu luontohavaintoaineistoille. MSc Thesis (in Finnish), University of Helsinki, Department of Computer Science, February, 2015.
bib pdf link Biologisten havaintoaineistojen julkaiseminen linkitettynä datana mahdollistaa useiden aineistojen yhdistämisen toisiinsa. Yhdistämällä toisiinsa useita samaan asiaan liittyviä aineistoja, voidaan saavuttaa parempi ymmärrys kiinnostuksen kohteena olevasta ilmiöstä kuin tutkimalla aineistoja erikseen. Näin voidaan mahdollistaa tarkempien päätelmien tekeminen aineistojen pohjalta sekä etsiä odotettuja tai odottamattomia yhteyksiä aineistojen välillä. Linkitetyssä datassa käytetty RDF-tietomalli tuo aineistoihin koneluettavuuden ja helpon tavan viitata kaikkiin aineistojen osiin. Linkitettynä datana julkaistuja aineistoja voidaan helposti rikastaa yhä uusilla aineistoilla. Tässä tutkielmassa käsitellään Hangon lintuaseman havaintoaineiston sekä Ilmatieteenlaitoksen Hangon Russarön säähavaintoaineiston mallinnusta, käsittelyä ja hyödyntämistä linkitettynä datana. Aineistot on mallinnettu käyttäen RDF Data Cube -sanastoa, joka parantaa aineistojen yhteentoimivuutta. Lintuhavaintoaineistoon on annotoitu lajitietoa käyttäen ontologiaa Suomen linnuista, jota on rikastettu mm. lajien tuntomerkkiontologialla sekä uhanalaisuustiedoilla. Aineistot on julkaistu Linked Data Finland -alustalla, ja aineistojen välisten yhteyksien hahmottamiseksi on kehitetty visualisointipalvelun prototyyppi. Säätilan tiedetään olevan tärkeimpiä päivittäisen lintumuuton voimakkuuteen vaikuttavia tekijöitä. Visualisointipalvelulla pyritään näyttämään käyttäjälle, miten säätila vaikuttaa lintuhavaintomääriin ja erityisesti havaittuun lintumuuttoon. Aineistojen välisten suhteiden parempi tuntemus mahdollistaa tarkempien päätelmien tekemisen lintuhavaintoaineiston perusteella. Tutkielmassa esitetyt menetelmät ovat yleistettävissä lintu- ja säähavaintoaineistojen lisäksi muihin rakenteeltaan samankaltaisiin aineistoihin.
2014
Esko Ikkala, Eetu Mäkelä and Eero Hyvönen:
TourRDF: Representing, Enriching, and Publishing Curated Tours Based on Linked Data.
19th International Conference of Knowledge Engineering and Management (EKAW 2014), Demo and Poster Papers, November, 2014.
bib pdf Current mobile tourist guide systems are developed and used in separate data silos: each system and vendor tends to use its own proprietary, closed formats for representing tours and point of interest (POI) content. As a result, tour data cannot be enriched from other providers’ tour and POI repositories, or from other external data sources — even when such data were publicly available by, e.g., cities willing to promote tourism. This paper argues, that an open shared RDF-based tour vocabulary is needed to address these problems, and introduces such a model, TourRDF, extending the earlier TourML schema into the era of Linked Data. As a test and an evaluation of the approach, a case study based on data about the Unesco World Heritage site Suomenlinna fortress is presented.
Eero Hyvönen, Miika Alonen, Esko Ikkala and Eetu Mäkelä:
Life Stories as Event-based Linked Data: Case Semantic National Biography.
Proceedings of ISWC 2014 Posters & Demonstrations Track, CEUR Workshop Proceedings, October, 2014.
bib pdf link This paper argues, by presenting a case study and a demonstration on the web, that biographies make a promising application case of Linked Data: the reading experience can be enhanced by enriching the biographies with additional life time events, by proving the user with a spatio-temporal context for reading, and by linking the text to additional contents in related datasets.
Eero Hyvönen, Jouni Tuominen, Miika Alonen and Eetu Mäkelä:
Linked Data Finland: A 7-star Model and Platform for Publishing and Re-using Linked Datasets.
The Semantic Web: ESWC 2014 Satellite Events. ESWC 2014 (Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I. and Tordai, A. (eds.)), pp. 226-230, Springer-Verlag, May, 2014.
bib pdf link The idea of Linked Data is to aggregate, harmonize, integrate, enrich, and publish data for re-use on the Web in a cost-efficient way using Semantic Web technologies. We concern two major hindrances for re-using Linked Data: It is often difficult for a re-user to 1) understand the characteristics of the dataset and 2) evaluate the quality the data for the intended purpose. This paper introduces the “Linked Data Finland” platform LDF.fi addressing these issues. We extend the famous 5-star model of Tim Berners-Lee, with the sixth star for providing the dataset with a schema that explains the dataset, and the seventh star for validating the data against the schema. LDF.fi also automates data publishing and provides data curation tools. The first prototype of the platform is available on the web as a service, hosting tens of datasets and supporting several applications.
Mikko Koho, Eero Hyvönen and Aleksi Lehikoinen:
Ornithology Based on Linking Bird Observations with Weather Data.
The Semantic Web: ESWC 2014 Satellite Events, vol. 8798, pp. 75-85, Springer, May, 2014.
bib pdf link This paper presents first results of a use case of Linked Data for eScience, where 0.5 million rows of bird migration observations over 30 years time span are linked with 0.1 million rows of related weather observations and a bird species ontology. Using the enriched linked data biology researchers at the Finnish Museum of Natural History will be able to investigate temporal changes in bird biodiversity and how weather conditions affect bird migration. To support data exploration, the data is published in a SPARQL endpoint service using the RDF Data Cube model, on which semantic search and visualization tools are built.
2013
Eero Hyvönen:
Linked Data Finland. Terminfo, no. 4, Finnish Terminology Centre TSK, Helsinki, Finland, 2013.
bib
Matias Frosterus:
ONKI-projekti luo kansallista ontologiapalvelua. Terminfo, no. 4, Finnish Terminology Centre TSK, Helsinki, Finland, 2013.
bib
Thea Lindquist, Michael Dulock, Juha Törnroos, Eero Hyvönen and Eetu Mäkelä:
Using Linked Open Data to Enhance Subject Access in Online Primary Sources. Cataloging & Classifying Quarterly, vol. 51, no. 8, Francis & Taylor, 2013.
bib link Using online primary sources is both rewarding and challenging for users. Improving subject access is essential as these sources become increasingly important in educational curricula. A user needs assessment with humanities users showed improving findability and context for historical subjects were major needs. Linked Data can help by linking related concepts in the sources using specialized vocabularies, enriching them with outside resources, and enabling semantic services that empower users. This article discusses a project to enhance subject access in an online World War I collection by deep linking historical data on the civilian experience in occupied Belgium and France.
Eero Hyvönen, Miika Alonen, Jouni Tuominen, and Eetu Mäkelä:
Linked Data Finland: Towards a 7-star Service Platform for Linked Datasets.
The First Annual KnowEscape Conference - KnowEscape 2013, Espoo, Finland, November, 2013.
bib pdf The idea of opening data on the Web as Linked Data (LD) is widely adopted in areas such as public government, science, libraries, and cultural heritage. The key idea is to harmonize, integrate, enrich, and re-use existing data repositories in a cost-efficient way via standard APIs in novel applications. This paper concerns two major hindrances for re-using LD: It is often difficult for a re-user to understand the 1) characteristics of the dataset and 2) evaluate the quality of the data for her intended purpose. This paper introduces the “Linked Data Finland” publishing platform LDF.fi addressing these issues. In order to enhance and promote reusability, we propose extending the famous 5-star model of Tim Berners-Lee into a 7-star model: The sixth star requires that the dataset is defined and explained in terms of explicit schemas. Explicit schemas make it possible to explain the re-user the intended characteristics of the data by, e.g., documentation about the schemas, and how the schemas (vocabularies) are actually used in the given dataset. The seventh star is given, if the data has also been validated w.r.t. the schema specifications. The results of the validation may be a human readable document and/or a machine readable reprentation regarding the quality issues found in the data. This paper reports about work in progress, but the first prototype of the platform is already operational on the web as a service http://ldf.fi.
Miika Alonen, Tomi Kauppinen, Osma Suominen and Eero Hyvönen:
Exploring the Linked University Data With Visualization Tools.
The Semantic Web: ESWC 2013 Satellite Events, pp. 204-208, Springer-Verlag, Berlin Heidelberg, Montpellier, France, May 26-30, 2013.
bib pdf University data is typically stored in separate data silos even though the data is implicitly richly related together. Such data has a large and diverse user base, including faculty members, students, industrial partners, alumnis, collaborating universities, and media. In this paper, we demonstrate two tools for understanding and using the contents of linked university data. The first tool, Visualization Playground (VISU), supports querying and visualizing the data for example for illustrating emerging trends in universities (e.g., about publications) and for com- paring differences. The second tool, Vocabulary Visualizer (V^2), demon- strates the usage of vocabularies in the Linked University Data Cloud. It reveals what kinds of data different universities have published, and what terms are used to describe the contents. Such analysis is a basis for facilitating design of Linked Data applications across university data boundaries.
Matias Frosterus, Jouni Tuominen, Sini Pessala, Katri Seppälä and Eero Hyvönen:
Linked Open Ontology Cloud KOKO--Managing a System of Cross-domain Lightweight Ontologies.
The Semantic Web: ESWC 2013 Satellite Events, pp. 296-297, Springer-Verlag, Berlin Heidelberg, Montpellier, France, May 26-30, 2013.
bib pdf
Jouni Tuominen, Nina Laurenne, Mikko Koho and Eero Hyvönen:
The Birds of the World Ontology AVIO.
The Semantic Web: ESWC 2013 Satellite Events, pp. 300-301, Springer-Verlag, Berlin Heidelberg, Montpellier, France, May 26-30, 2013.
bib pdf We present an ontology for managing the scientific and common names of birds. The ontology is based on the TaxMeOn meta-ontology model for biological names. The ontology is in use as an ontology service and it has been applied in a bird watching system.
Matias Frosterus, Jouni Tuominen, Mika Wahlroos and Eero Hyvönen:
The Finnish Law as a Linked Data Service.
The Semantic Web: ESWC 2013 Satellite Events, pp. 289-290, Springer-Verlag, Berlin Heidelberg, Montpellier, France, May 26-30, 2013.
bib pdf Juridical information is important to organizations and individuals alike and is linked to from all walks of life. The Finnish government has published the Finlex Data Bank for searching and browsing legislation documents. However, the data there is not yet open, is based on a traditional XML schema, and does not conform to new semantic metadata standards. There are many difficulties in maintaining and using the site in, e.g., data harvesting, interoperability, querying, and linking that could be mitigated by the Semantic Web technologies. This paper presents an approach and a project—including first results—for publishing and using Finnish legislation as a 5-star Linked Open Data service.
Jouni Tuominen, Nina Laurenne and Eero Hyvönen:
Publishing and Using Plant Names as an Ontology Service.
Proceedings of the first international Workshop on Semantics for Biodiversity (S4BioDiv), ESWC 2013, CEUR Workshop Proceedings, Vol 979, Montpellier, France, May, 2013.
bib pdf link Animals and plants are referred to using scientific or common names depending on the expertise of an audience or a source of data. The names change in time and therefore their usage as identifiers as such is problematic. We present a solution for managing and using plant names as an ontology. The ontology is based on the TaxMeOn meta-ontology for biological names. In order to refer to organisms unambiguously and publish information as Linked Data on the web, the names are given URIs. The ontology is developed collaboratively and it supports the approval process and temporal tracking of the common names. We introduce an ontology service of plant names for end-users and provide user interfaces and APIs for integrating the ontology into applications.
Eero Hyvönen, Miika Alonen, Mikko Koho and Jouni Tuominen:
BirdWatch--Supporting Citizen Scientists for Better Linked Data Quality for Biodiversity Management.
Proceedings of the first international Workshop on Semantics for Biodiversity (S4BioDiv), ESWC 2013, CEUR Workshop Proceedings, Vol 979, Montpellier, France, May, 2013.
bib pdf link Observational data about species of public interest, such as birds and butterflies, is often created and collected by volunteered citizen scientists, and used by professionals for managing biodiversity. The education and skills of the citizens participating in the work varies a lot, and the process of making observations is typically not systematic but rather ad hoc. As a result, the quality of the observational data in repositories, such as the Global Biodiversity Information Facility GBIF Data Portal, is often not good, hampering its utilization severely. This paper presents an approach for enhancing data quality in a citizen science setting, and presents a mobile tool BirdWatch for citizen observers, mitigating difficulties in producing high quality Linked Data for biodiversity management.