Contact us
Knowledge Hub posts.

Lunch with the FRDN Knowledge Hub data stewards:Metadata explained for research data

By Veerle Van den Eynden (KU Leuven)

Metadata seems to be the most difficult aspect of research data management for researchers to describe in a data management plan, let alone implement in their research. So at  the FRDN Knowledge Hub lunch session on 27 January 2022 we looked at the basics of how and why metadata are applied to data for different purposes and for different types of data, and how as data stewards we can advise researchers about metadata.

Fwo Web 09028

Metadata: The basics

Metadata are a structured form of documentation and information that explains data. They are typically machine readable and standardised, and can be applied to datasets at the level of the dataset and at the level of individual data points. Metadata play a role in making datasets Findable, Interoperable and Reusable.

First and foremost it should be a priority for researchers to add metadata to their datasets that explains clearly what the data mean, and to describe in their DMP how they will do this. Whilst this may seem too obvious to describe in a DMP, the reality is that still too many examples exist of datasets that are difficult to understand because variables, codes or units are not well described. Variable descriptions, codebooks and Readme files can explain data, so this can be described in a DMP. Only once datasets are well documented with metadata, we can start thinking about applying suitable community standards, controlled vocabularies and metadata standards. 

Metadata are also used by data repositories to make data findable and interoperable, and to exchange metadata for wider discovery. Each data repository uses a metadata standard. Often this is based on DataCite or Dublin Core, with possible extra disciplinary metadata elements. Data repositories usually request the metadata they need via an online submission form. Researchers can then describe in their DMP which data repository they will use and which metadata standard this repository uses. 

Metadata Standards

Where possible, researchers should use community data standards, metadata standards, minimum information standards and controlled vocabularies where that makes sense for their datasets. This can be as simple as using universal date / time stamps, international units, ISO country standards, Medical Subject Headings (MeSH), etc. Metadata standards are very powerful, if a system or infrastructure exists that will make use of it or if the research community uses it. Otherwise, researchers can develop their own standards, starting from a simple standard like Dublin Core. 

A practical example: OME TIFF for microscopy images

At  the lunch session, researcher Benjamin Pavie (VIB – KU Leuven) showed us a  practical example of a (discipline-specific) metadata standard. More concretely, he explained how to use OME-TIFF as a standard for microscopy-images, to overcome the problem of images existing in multiple proprietary formats according to the vendor. OME, the Open Microscopy Environment, was started in 2005 to remedy this situation and emerged as a standard around 2015. A partnership of universities and contributors developed OME-TIFF as a metadata-rich open data format for bio-images. They also developed Bio-format, an API with tools to convert proprietary microscopy image data and metadata to OME; OMERO as an image data management platform; and Image Data Resource (IDR) for publishing and sharing imaging data. Currently, OME-NGFF (Next Generation File Format) is emerging to access large datasets from the cloud.

A practical example: Metadata for FAIR-GNSS 

Data steward Stefanie De Bodt (UGent) showed a second example on how standards were developed to make Global Navigation Satellite System (GNSS) reference station data more FAIR. A European network of observatories generate instrument metadata for stations and observation metadata from satellite observations. The project started by looking at relevant existing standards such as DataCite, Dublin Core, DCAT, schema.org and NASA Earth data GCDM keywords. These standards were mapped against the GNSS databases to then develop a suitable standard for the GNSS data. 

Key advice

Key messages from this project, which are good advice for any research project are:

  • to start as much as possible from existing commonly used standards
  • to start simple, with practical examples and extend later
  • to use technologies that researchers know.

Some final advice: How a researcher records metadata is more important to describe in a DMP than knowing which standard to use. It can also be enough to indicate in a DMP that researchers will comply with data repository standards where data will be published, and to be aware of what kind of standard that repository requires. 

Cookie Consent

This website uses cookies that help the website to function and also to track how you interact with our website. To ensure the best user experience, enable the specific cookies from Preferences, and click on Accept.

Cookie Consent

  • Necessary info

  • Statistics info