Skip to content

Unit 2.8 Dataset metadata

Overview

Unit study time

  • 10 minutes

Intended Learning Outcome

By the end of the unit, you will be able to ...

  • Describe what dataset metadata provides and how it differs from study metadata.

What is dataset metadata?

Dataset metadata provides the who, what, when, why, where and how of a singular dataset. It provides specific information about how the data within a dataset was collected.

For larger research projects, you may produce multiple datasets. These datasets could have different areas of interest or be carried out in different locations or by a certain researcher. As such, having metadata for each dataset helps you manage vast amounts of data and saves you time in the future.

Many of the metadata elements we use to describe the overall study can be reused at the dataset level.

What metadata elements we capture at the study level could we also capture at the dataset level?

Repeated metadata elements to capture for datasets

Title The title of the dataset

Creator The creator of the particular dataset

Subject e.g. keywords or topics

Description e.g. a description of the dataset and what it includes

Contributor

e.g. people or organisations who contributed to the research process

Date e.g. the date range of when the data for that dataset was collected

Type

Format The format that the dataset is stored in

Language The language the dataset is stored in

Relation Any other publications or resources that are related to that dataset

Coverage The geographical coverage of the dataset

Access rights The access rights of the individual dataset

We can also capture additional metadata elements that we didn't cover in the study metadata to give further information about a dataset.

What additional metadata elements should we collect for a dataset?

New metadata elements to capture for datasets

Kind of data

Study The study that produced the dataset

Case quantity The case quantity refers to number of data instances that were recorded (for tabular data, this is the number of rows)

Variables How many variables are in the dataset

Last Updated When the dataset was last updated