GROUP HAS COMPLETED THEIR WORK AND IS NO LONGER ACTIVE.
FOR QUERIES, PLEASE WRITE TO ‘INFO@FORCE11.ORG’
In the eScience ecosystem, the challenge of enabling optimal use of research data and methods is a complex one with multiple stakeholders: Researchers wanting to share their data and interpretations; Professional data publishers offering their services, software and tool-builders providing data analysis and processing services; Funding agencies (private and public) increasingly concerned with proper Data Stewardship; and a Data Science community mining, integrating and analysing the output to advance discovery. Computational analysis to discover meaningful patterns in massive, interlinked datasets is rapidly becoming a routine research activity. Providing machine-readable data as the main substrate for Knowledge Discovery and for these eScientific processes to run smoothly and sustainably is one of the Grand Challenges of eScience.
This groups main aim is to create and put up for community endorsement a document that is a general 'guide to FAIRness of data', not a “specification”.
- Data should be Findable
- Data should be Accessible
- Data should be Interoperable
- Data should be Re-usable.
These FAIR Facets are obviously related, but technically somewhat independent from one another, and may be implemented in any combination, incrementally, as data providers and FAIRports evolve to increasing degrees of FAIR-ness. As such, the barrier-to-entry for FAIR data producers, publishers and stewards is maintained as low as possible, with providers being encouraged to gradually increase the number of FAIR Facets they comply with.
In compiling the draft FAIR guiding principles, technical implementation choices have been consciously avoided. The minimal [FAIR Guiding Principles] are meant to guide implementers of FAIR data environments in checking whether their particular implementation choices are indeed rendering the resulting data FAIR. In Explanatory notes and annexes we give some non-binding explanation and guidance for a FAIR view on data and what constitutes a repository of FAIR data (a 'Data FAIRport')
In January 2014, representatives of a range of these stakeholders came together at the request of the Netherlands eScience Center and the Dutch Techcentre for the Life Sciences (DTL) at the Lorentz Center in Leiden, The Netherlands, to think and debate about how to further enhance this ecosystem. From these discussions, the notion emerged that, through the definition and widespread support of a minimal set of community-agreed guiding principles and practices, data providers and data consumers – both machine and human – could more easily discover, access, interoperate, and sensibly re-use, with proper citation, the vast quantities of information being generated by contemporary data-intensive science. These simple principles and practices should enable a broad range of integrative and exploratory behaviors, and support a wide range of technology choices and implementations, just as the Internet Protocol (IP) provided a minimal layer – the "waist" of an hourglass – that enabled the creation of a vast array of data provision, consumption, and visualization tools on the Internet
FAIR for machines as well a people…
- In eScience, two clearly separated substrates for knowledge discovery can be distinguished.
- The actual data, which is as a rule beyond human intellectual capacity to analyze and The 'Explicitome' (everything we already made explicit in text, databases and any other format to date).
The essence of eScience is that either functionally interlinked existing data or the combination of those with newly generated 'relatively small' datasets lead to new insights. A crucial step is machine-assisted 'pattern recognition' in the data, which is followed by 'conformational' human study of the Explicitome to rationalize patterns and determine testable hypotheses. Obviously this is a cyclical process by nature, but computational analysis of massive, originally dispersed and variable datasets is a crucial phase in any eScience process.
Recognizing this new grand challenge in contemporary science, in its inaugural meeting: [Jointly Designing a Data FAIRTport'] the stakeholder group coalesced around four desiderata that a modern data publishing environment should provide to support both manual and automated deposition, exploration, sharing, and use to support machines as well as humans.