Unit 3.0 Metadata activity

Unit overview

Unit study time

30 minutes

Intended Learning Outcomes

By the end of the unit, you will be able to ...

Identify key metadata elements
Evaluate dataset suitability
Apply good metadata practices to support FAIR (Findable, Accessible, Interoperable, Reusable) data principles

Evaluating a dataset using metadata

You are looking for data to support a research project on physical activity in young adults in Wales before and after the COVID-19 pandemic.

Below is a metadata record for a fictional dataset.

Dataset metadata

Title: Health and Lifestyle Survey
Creator: –
Date: 2018–2019
Geographic coverage: UK
Topics: Exercise, diet, demographics, hobbies
File format: CSV
Documentation: Codebook
Licence: Not specified

Task 1: What metadata is useful and what metadata is missing or unclear?

Identify at least two important gaps or ambiguities and two useful aspects of the metadata.

Example answers

Missing or unclear metadata:

Creator (who produced the data)
Population
Sample size
Variable list and definitions
Methodology (how data was collected)
Licensing and access conditions
Whether the codebook is accessible
Geographic detail

Useful metadata:

Topics indicate potential relevance
Presence of a codebook (even if not accessible)
File format (CSV)
Time period

Task 2: Is this dataset suitable for the study?

Based on the metadata, would you use this dataset for your research?

Tip

To decide whether a dataset is suitable for your research, it is helpful to consider:

Does it cover the right people? e.g. Population
Does it measure the right things? e.g. Variables, concepts, unit type
Is it collected in the right way? e.g. Methodology
Is it from the right place and time? e.g. Coverage
Can I access and use it? e.g. Licensing, access

You can use these questions to guide your answer.

Example answer

No as it is unclear whether this dataset is suitable based on the available metadata.

The data only covers the pre-pandemic period (2018–2019), so it does not support comparison with post-pandemic data. It also does not state whether similar data was collected later.
The population is not described, so it is unclear whether young adults are included.
Geographic coverage is broad (UK). While this may include Wales, this is not explicit and does not indicate sample size or breakdown.
Missing creator information makes it difficult to assess credibility.
Licensing is unclear, so reuse may not be permitted or may involve restrictions (e.g. fees, application process, secure access).
Variables are not listed, so it is unclear whether relevant measures are included.

Task 3: What would you do next?

If you wanted to use this dataset, what steps would you take?

Example answer

Check whether post-pandemic data exists
Look for additional documentation
Locate and review the codebook
Identify where the dataset is stored (repository or catalogue)
Contact the data provider for clarification
Search for alternative or additional datasets

Task 4: How could this metadata record be improved?

What specific changes would make this metadata record easier to find, and decide whether it is suitable for your research?

Example answer

Key improvements (quick summary):

Add clear information about the population and who the data represents
Describe what is measured and how the data were collected
Clarify when and where the data were collected
Provide access to documentation (e.g. codebook)
Include a stable link or DOI
Clearly state how the data can be used (licensing)

Improvements to support suitability decisions:

The metadata should be improved so that a researcher can clearly decide whether the dataset is suitable for their research.

Population – Specify who is included (e.g. age range, location)
→ Helps determine whether the data covers the right people
Variables and measures – Provide a list of variables and what they represent
→ Helps assess whether the dataset measures the right things
Data collection method – Describe how the data were collected
→ Helps evaluate data quality and comparability
Time period and geographic coverage – Provide more detail about when and where the data were collected
→ Helps assess whether the data is relevant to the research context
Documentation – Ensure the codebook is accessible and includes variable definitions
→ Supports correct interpretation
Access and licensing – Clearly state how the data can be accessed and reused
→ Enables decisions about whether the data can be used

Supporting standards and practices:

Including persistent identifiers (e.g. DOIs) for reliable access
Using common metadata elements (e.g. title, description, coverage) found in standards such as Dublin Core
Using controlled vocabularies to improve consistency and searchability

Task 5: Reflection

What metadata did you rely on most when finding, understanding, or deciding whether to use the data?
What is one thing you will do differently when creating or using metadata after this activity?

Example answers

There is no single correct answer. The goal is to recognise how different types of metadata support your ability to find, understand, and decide whether data is suitable.

You may have thought about how you...

needed metadata that helped filter or narrow down options, especially when searching across multiple datasets e.g. year, topics
looked for metadata that helped to quickly judge relevance e.g. concepts, title
relied on metadata that gave enough context to understand the data i.e. what it represented and how it was created e.g. data collection, methodology, variables
focused on metadata that reduced uncertainty and helped determine the data quality - anything that made me more confident about what the data actually contained e.g. provenance, creator
relied on whatever was easiest to see first even if was not the most important for assessing suitability e.g. title, keywords
focused on metadata that helped decide whether I could use the data (including whether within my time and budget) e.g. licence, format, access conditions

In the future you may try to...

think more about what someone else would need to know to decide if the data is suitable
look for any relevant controlled vocabularies
aim to reduce ambiguity in my metadata by making things clearer rather than assuming knowledge
check to see what might be missing
be more intentional about what metadata to include, prioritising the information that matters most
spend more time on metadata and recognise it as part of the research process, not an add-on
refer to metadata standards

Summary

Good quality metadata is not just descriptive, it supports different stages of the research lifecycle, such as finding relevant datasets at the start of a project. When key metadata are missing or are unclear, it becomes difficult to judge whether data are relevant, reliable, or usable. In contrast, clear, well-structured and detailed metadata reduces uncertainty, supports informed decision making, and makes research processes easier and more efficient.