Skip to content

Unit 3.0 Metadata activity

Unit overview

Unit study time

  • 30 minutes

Intended Learning Outcomes

By the end of the unit, you will be able to ...

  • Identify key metadata elements
  • Evaluate dataset suitability
  • Apply good metadata practices to support FAIR (Findable, Accessible, Interoperable, Reusable) data principles

Evaluating a dataset using metadata

You are looking for data to support a research project on physical activity in young adults in Wales before and after the COVID-19 pandemic.

Below is a metadata record for a fictional dataset.

Dataset metadata

  • Title: Health and Lifestyle Survey
  • Creator:
  • Date: 2018–2019
  • Geographic coverage: UK
  • Topics: Exercise, diet, demographics, hobbies
  • File format: CSV
  • Documentation: Codebook
  • Licence: Not specified

Task 1: What metadata is useful and what metadata is missing or unclear?

Identify at least two important gaps or ambiguities and two useful aspects of the metadata.

Example answers

Missing or unclear metadata:

  • Creator (who produced the data)
  • Population
  • Sample size
  • Variable list and definitions
  • Methodology (how data was collected)
  • Licensing and access conditions
  • Whether the codebook is accessible
  • Geographic detail

Useful metadata:

  • Topics indicate potential relevance
  • Presence of a codebook (even if not accessible)
  • File format (CSV)
  • Time period

Task 2: Is this dataset suitable for the study?

Based on the metadata, would you use this dataset for your research?

Tip

To decide whether a dataset is suitable for your research, it is helpful to consider:

  • Does it cover the right people? e.g. Population
  • Does it measure the right things? e.g. Variables, concepts, unit type
  • Is it collected in the right way? e.g. Methodology
  • Is it from the right place and time? e.g. Coverage
  • Can I access and use it? e.g. Licensing, access

You can use these questions to guide your answer.

Example answer

No as it is unclear whether this dataset is suitable based on the available metadata.

  • The data only covers the pre-pandemic period (2018–2019), so it does not support comparison with post-pandemic data. It also does not state whether similar data was collected later.

  • The population is not described, so it is unclear whether young adults are included.

  • Geographic coverage is broad (UK). While this may include Wales, this is not explicit and does not indicate sample size or breakdown.

  • Missing creator information makes it difficult to assess credibility.

  • Licensing is unclear, so reuse may not be permitted or may involve restrictions (e.g. fees, application process, secure access).

  • Variables are not listed, so it is unclear whether relevant measures are included.


Task 3: What would you do next?

If you wanted to use this dataset, what steps would you take?

Example answer
  • Check whether post-pandemic data exists
  • Look for additional documentation
  • Locate and review the codebook
  • Identify where the dataset is stored (repository or catalogue)
  • Contact the data provider for clarification
  • Search for alternative or additional datasets

Task 4: How could this metadata record be improved?

What specific changes would make this metadata record easier to find, and decide whether it is suitable for your research?

Example answer

Key improvements (quick summary):

  • Add clear information about the population and who the data represents
  • Describe what is measured and how the data were collected
  • Clarify when and where the data were collected
  • Provide access to documentation (e.g. codebook)
  • Include a stable link or DOI
  • Clearly state how the data can be used (licensing)

Improvements to support suitability decisions:

The metadata should be improved so that a researcher can clearly decide whether the dataset is suitable for their research.

  • Population – Specify who is included (e.g. age range, location)
    → Helps determine whether the data covers the right people

  • Variables and measures – Provide a list of variables and what they represent
    → Helps assess whether the dataset measures the right things

  • Data collection method – Describe how the data were collected
    → Helps evaluate data quality and comparability

  • Time period and geographic coverage – Provide more detail about when and where the data were collected
    → Helps assess whether the data is relevant to the research context

  • Documentation – Ensure the codebook is accessible and includes variable definitions
    → Supports correct interpretation

  • Access and licensing – Clearly state how the data can be accessed and reused
    → Enables decisions about whether the data can be used

Supporting standards and practices:

  • Including persistent identifiers (e.g. DOIs) for reliable access
  • Using common metadata elements (e.g. title, description, coverage) found in standards such as Dublin Core
  • Using controlled vocabularies to improve consistency and searchability

Task 5: Reflection

  • What metadata did you rely on most when finding, understanding, or deciding whether to use the data?
  • What is one thing you will do differently when creating or using metadata after this activity?
Example answers

There is no single correct answer. The goal is to recognise how different types of metadata support your ability to find, understand, and decide whether data is suitable.

You may have thought about how you...

  • needed metadata that helped filter or narrow down options, especially when searching across multiple datasets e.g. year, topics
  • looked for metadata that helped to quickly judge relevance e.g. concepts, title
  • relied on metadata that gave enough context to understand the data i.e. what it represented and how it was created e.g. data collection, methodology, variables
  • focused on metadata that reduced uncertainty and helped determine the data quality - anything that made me more confident about what the data actually contained e.g. provenance, creator
  • relied on whatever was easiest to see first even if was not the most important for assessing suitability e.g. title, keywords
  • focused on metadata that helped decide whether I could use the data (including whether within my time and budget) e.g. licence, format, access conditions

In the future you may try to...

  • think more about what someone else would need to know to decide if the data is suitable
  • look for any relevant controlled vocabularies
  • aim to reduce ambiguity in my metadata by making things clearer rather than assuming knowledge
  • check to see what might be missing
  • be more intentional about what metadata to include, prioritising the information that matters most
  • spend more time on metadata and recognise it as part of the research process, not an add-on
  • refer to metadata standards

Summary

Good quality metadata is not just descriptive, it supports different stages of the research lifecycle, such as finding relevant datasets at the start of a project. When key metadata are missing or are unclear, it becomes difficult to judge whether data are relevant, reliable, or usable. In contrast, clear, well-structured and detailed metadata reduces uncertainty, supports informed decision making, and makes research processes easier and more efficient.