ODK continuation Proposal: FAIRmat (FAIR Mathematical Data for the European Open Science Cloud)

Michael Kohlhase - 29 Jan 2019

A subgroup of the OpenDreamKit consortium centered in WP6 has just submitted a proposal in the European research infrastructures topic: Implementing the European Open Science Cloud: FAIR Mathematical Data for the European Open Science Cloud (FAIRmat).

The problem this proposal wants to solve is that the European Science Cloud currently under-represents mathematical data and data-based services.

The proposal builds on the FAIR principles (Findability, Accessiblity, Interoperability, and Reusability), and tries to realize them for mathematical data. To make FAIR meaningful for mathematical data, we need to represent the meaning of all three data categories (deep FAIR):

concrete data (i.e. mathematical knowledge represented as record or array data)
symbolic data (i.e. mathematical knowledge represented as terms)
linked data (aka. Metadata) The aim of the FAIRmat proposal is to standardize mathematical data representations, scale existing mathematical services and make them interoperable, and integrate them into the European Science Cloud for sustainability.

The FAIRmat consortium consists of the following partners:

#	Participant organisation name	Short name	Country
1	Friedrich-Alexander Universitaet Erlangen/Nuernberg (coordinator)	FAU	DE
2	Universite Paris-Sud	UPSud	FR
3	Chalmers University of Technology	CHA	SE
4	Univerza v Ljubljani	UL	SI
5	CAE Tech Limited	CAE	UK
6	FIZ Karlsruhe – Leibniz Institute for Information Infrastructure	FIZ	DE
7	European Mathematical Society	EMS	FI

Like the OpenDreamKit Proposal Process, the FAIRmat proposal was developed on a public repository.

We include the abstract for convenience below.

The scientific community increasingly considers datasets as their own kind of resource that should be shared and published individually, either in conjunction with a traditional paper or as a standalone digital artefact. Recently this trend has formed a positive feedback loop with the rise of deep learning methods, which require large datasets as input. Multiple national and international Open Science initiatives have been started to ensure the open availability, easy sharing, and reliable reproducibility of datasets in particular and data-driven research in general. The FAIR principles in particular have been developed as a goalpost for the openness of datasets, including in particular the sharing across disciplines and between research and industry.

As a rule, mathematicians strongly support the Open Science movement and happily make their datasets public for both practical and ethical reasons. This is accompanied by a vibrant and growing community of Open Source software for computational mathematics. However, the systematic FAIR sharing of mathematical data is very difficult due to the inherent complexity of the data. Therefore, today most mathematical data collections are shared in an ad hoc manner that is limited in scope and suffers from a lack of interlinking of digital artefacts across platforms. Thus, FAIR mathematics, while widely welcomed, is effectively non-existent today. A similar argument applies to related sciences to the extent that they make heavy use of mathematical data, e.g., the mathematical modeling of cyber-physical systems.

Generally, reusing shared data requires that the reuser be able to understand the semantics of the data. This is particularly difficult for system interoperability where the semantics must not only be evident but must itself be accessible for automated processing, and it is particularly critical where data is used in safety-critical systems. While this problem exists for all data, it is particularly challenging for mathematical and similar data in related disciplines where the semantics is very difficult to specify. Therefore, today there are virtually no mathematical datasets whose semantics is itself accessible.

This is evidenced by the wide gap in the service offerings of the EOSC when it comes to semantics-aware services in general and services for mathematical data in particular.

FAIRMat (pronounced “Fermat”) will deliver a framework and prototype service for the FAIR and semantics- aware sharing of mathematical data. It will meet the needs of mathematics research and education and will also bring added value to other disciplines and industry that work with data that exhibits complex structure or semantics. It will support all phases of the research life-cycle including the generation, publication, updating, extending, curation, search, reuse, and archival of mathematical data.

FAIRMat will be based on a sustainable software ecosystem including Open Source databases and services as well as nationally funded infrastructures. This includes TRL 6 and above technologies like the Modelica modeling language, the LMFDB database, the SageMath computer algebra suite, or the zbMATH publication information system.

Besides supporting the direct sharing of newly-produced data, FAIRMat will make existing datasets available in a uniform way. In particular, we will demonstrate the scalability of the FAIRMat infrastructure by integrating it with several large and important databases and services from diverse communities including databases of pure mathematical objects (graphs, integer sequences, elliptic curves, . . . ), formalized theorems and proofs, mathematical models used in engineering, linked data from Wikidata, and publication metadata. Finally, to maximize its long-term impact, FAIRMat is designed to culminate in the ISO standardization of its data representation format and the smooth integration of its services into the EOSC Hub at the end of the project lifetime.

FAIRMat will be carried out by a consortium of 7 sites with huge experience in designing and maintaining mathematical datasets and services. The majority of partners are long time Open Science promoters, with a strong experience in large open (software) project management. In particular, all produced documents (including this proposal itself), software, and data will be licensed under open licenses and freely available.

Front Page WP6: Data/Knowledge/Software bases Math-in-the-Middle Future funding Sustainability