Lunch with the FRDN Knowledge Hub data stewards: Towards recommended data repositories
By Niek Van Wettere and Alexander Botzki
Do we rather opt for the standard repository choice in a particular research community, even if this repository is non-European / not EOSC-federated, or do we give precedence to the European alternative, even if it is less known in the research community, keeping the European data sovereignty strategy in mind? Which arguments would you use to explain to researchers that it's better to upload their dataset in a discipline-specific repository than a general one? Do we always exclude repositories without persistent identifier facilities and/or machine-readable re-use licences, or are there situations where this drawback is mitigated by other positive factors? How do you see trusted repositories taking up? To what extent is the European repository accreditation system already mature enough? These were the initial questions to be addressed during the second lunch session of the FRDN Knowledge Hub for data stewards.
Around 40 data stewards, data managers and research support staff, were joining this second session of the Knowledge Hub lunch meetings, with presentations on the data repository landscape.
Evy Neyens (UHasselt) set the scene as she talked about the FOSB Generic metadata model and bridging towards repositories as a public reservoir for datasets. Thereafter, Niek Van Wettere (VUB), Stefanie De Bodt (UGent), Elien Dewitte (VLIZ) and Alexander Botzki (VIB) gave overviews of the respective European repository landscape in Humanities, Engineering and Life and Marine Sciences. It quickly became evident during the presentations that, depending on the research discipline, researchers are facing numerous challenges as to the maturity and choice of repositories for data deposition and data sharing on the institutional, national and international level.
In the lively discussion following the presentation, the participants shed their light on how to deal with data submission to generic and/or discipline-specific data repositories. The audience quickly agreed on the fact that trusted & fair enabling discipline-specific repositories provide essential services to researchers, making it easier for them to put FAIR-compliant long-term preservation into practice. Furthermore, additional services from discipline-specific repositories such as dataset quality control are big plus points.
While certain research disciplines already have community-endorsed, discipline-specific repositories at their disposal, a lot of work still needs to be done to make trusted and accredited data repositories available to every researcher. In order to provide guidance to researchers affiliated with Flemish research performing organizations concerning the repositories, they should preferably use a Flemish “white list” of approved repositories. Such a list could be established, especially for those research domains where more formal European certification is not yet widespread. One participant stated that “there is a role for the funders here to stimulate researchers to make use of certified repositories. It can be a requirement like open access publications and open data. Also, when the FOSB WG M&S will start with discipline-specific metadata standards, certified repositories can be recommended within the allowed values fields.”
Next to these many positive sides of generic and discipline-specific repositories, important concerns have been raised related to the submission of person-related research data to repositories. Person-related data is prevalent in certain disciplines such as life sciences and humanities, but, since the first focus of EOSC is on non-sensitive data that can be made openly available, most European data repositories currently do not have adequate security and/or legal provisions to accommodate the sensitivity of this type of data. Hence the need for a Flemish archiving solution to remedy this infrastructure gap. Additionally, it is of utmost importance to ensure as a scientific community that sustainable business models are worked out for European repositories to guarantee the longevity of all the valuable datasets that have been and will be deposited in increasingly larger quantities in the future.
The more we, as a data steward and data manager community, stimulate the use of data repositories and encourage scientists to provide highly curated metadata with the datasets, the more re-use of datasets will take off, which is a necessary prerequisite to fully acknowledge datasets as a valuable research output on their own. Of course, this hinges upon the availability of quality and long-term sustainable data infrastructures in all research disciplines. And then, it will be a no-brainer for the researchers to use these go-to places for data sets: their trusted repositories.