Unit 2.8 Dataset metadata
Overview
Unit study time
- 10 minutes
Intended Learning Outcome
By the end of the unit, you will be able to ...
- Describe what dataset metadata provides and how it differs from study metadata.
What is dataset metadata?
Dataset metadata provides the who, what, when, why, where and how of a singular dataset. It provides specific information about how the data within a dataset was collected.
For larger research projects, you may produce multiple datasets. These datasets could have different areas of interest or be carried out in different locations or by a certain researcher. As such, having metadata for each dataset helps you manage vast amounts of data and saves you time in the future.
Many of the metadata elements we use to describe the overall study can be reused at the dataset level.
What metadata elements we capture at the study level could we also capture at the dataset level?
Repeated metadata elements to capture for datasets
Title The title of the dataset
Creator The creator of the particular dataset
Subject e.g. keywords or topics
Description e.g. a description of the dataset and what it includes
Contributor
e.g. people or organisations who contributed to the research process
Date e.g. the date range of when the data for that dataset was collected
Type
Format The format that the dataset is stored in
Language The language the dataset is stored in
Relation Any other publications or resources that are related to that dataset
Coverage The geographical coverage of the dataset
Access rights The access rights of the individual dataset
We can also capture additional metadata elements that we didn't cover in the study metadata to give further information about a dataset.
What additional metadata elements should we collect for a dataset?
New metadata elements to capture for datasets
Kind of data
Study The study that produced the dataset
Case quantity The case quantity refers to number of data instances that were recorded (for tabular data, this is the number of rows)
Variables How many variables are in the dataset
Last Updated When the dataset was last updated