Unit 2.3 Question and measurement metadata

Overview

Unit study time

30 minutes

Intended Learning Outcome

By the end of the unit, you will be able to ...

Describe key metadata elements for questionnaire items and measurements.
Create question‑level metadata including response domains and instructions.
Demonstrate how measurement metadata affects comparability and validity across studies.

In this unit, we explore metadata for two major forms of data collection instruments: questionnaires and measurements. Both produce item‑level data that require similar forms of metadata.

Questionnaires and questions

Questionnaires are one of the most common methods for collecting research data. Without metadata describing the questionnaire and mode of administration, researchers may misinterpret the data or fail to recognise mode‑related bias.

Questionnaire and question metadata describe how each question was presented, who it was asked to, and how the respondent could answer.

Why create questionnaire and question metadata

Clarifies what the data represents Question wording, instructions and response options define exactly what respondents were asked, preventing misinterpretation of variables, and enabling more reliable reuse and secondary analysis.
Explains missingness and routing Skip logic, conditions, and sub‑universes show why some respondents answered or skipped certain questions. This helps interpret missing data correctly and determine whether the data is suitable for your research question, saving time and resources.
Reveals design choices that affect data quality Mode of administration (e.g. telephone vs in‑person), use of visual aids, and interviewer involvement can influence how people respond.
Improves transparency and trustworthiness Documenting how questions were delivered makes the research process clearer and more credible. Providing provenance for each variable clarifies how and why the data were collected. Provenance describes the origins of the data collected capturing how, where, and under what conditions a value was produced.
Supports comparability across waves or studies Small changes in question wording, order or response categories can affect results; metadata makes these differences visible.

Questionnaire and question metadata

Let's try creating question metadata for this questionnaire.

Alt Text

First, we need to describe the question itself. This includes capturing the question text and instruction as they appear on the original questionnaire document.

Question label
The question label is the unique code assigned to a question which does not contain any spaces and is machine readable. This can follow whatever logical format the researcher determines, though it’s important that the question label is unique.

Question name
The question name is similar to the question label but is a human readable tag used to identify a question. Sometimes this tag is included in the original questionnaire file. For example, in the questionnaire above, they list the questions numerically, so we can use those numbers as the question names. You can also create question names if they're not available in the questionnaire. The format should follow a logical structure with a unique name for each question in order to avoid confusion.

Question text
The question as written in the original questionnaire. Note, it's important when capturing question text metadata that you don't correct any mistakes in the original research materials, such as a spelling or grammatical errors, as you want to ensure your metadata is as accurate as possible. The exact wording of a question can influence how respondents interpret and respond to it, so preserving the original text ensures that your metadata accurately reflects what respondents actually saw

Question instruction
Sometimes a question is accompanied by a further instruction. For example, for question 11, there could be the further instruction 'Please exclude time watching videos on computers.'. In this scenario, the sentence is not part of the original question but provides further direction on the answer the question. It would be captured in the 'Question instruction' rather than the 'Question text'.

Response domain
The type of response allowed (e.g. numeric, text, date/time, codelist, scale). These will be described in more detail in the next unit.

Select type
Whether the participant could choose one or multiple options.

Example of how this metadata could be created in practice

Question label	Question name	Question text	Question instruction	Response Domain
Q_11	11	How many hours do you spend watching TV, including video and DVDs, on a normal school day?	-	Code list
Q_12	12	Do you have your own personal mobile phone?	-	Code list
Q_13	13	How many close friends do you have – friends you could talk to if you were in some kind of trouble?	-	Numeric

Response domains

Let's look at response domain types in a bit more detail. In questionnaires and surveys, respondents answer questions in different ways. They might select answers from predefined options or there might be a free text field.

Response domain metadata defines how someone can answer a question. It establishes the range or boundaries of valid responses, which can be either fixed or variable depending on the type of question.

Examples of different of response domains are ...

Codelist When respondents choose from a predefined set of answers, this is called a codelist. A codelist is made up of a set of categories that are represented by a code (we will delve deeper into codes and categories in the next unit).

Numeric Answers are recorded in numeric format. You could define the allowable value further by specifying whether the answer field accepts decimal places or only numbers within a certain range, for example.

Text Answers are recorded in text format. You could define this further by specifying if there is a word limit.

Date/time Answers are recorded as date/time. If the questionnaire controls how the date and/or time are recorded, you might also want to include that in your metadata. For example, if time has to be recorded in digital format or if respondents are asked to record a date in DD/MM/YYYY format.

If you only had access to the above metadata and not the original questionnaire, what additional information might you need to fully understand how responses were recorded and interpreted?

Additional information for interpreting response domains

For numeric answers, we may need details about precision (e.g. whether decimal places or negative values are allowed) to understand how measurements were recorded.
For text answers, knowing whether there were word limits or guidance can help interpret how detailed or constrained responses might be.
For codelists, we need the categories and their corresponding codes, as well as how many responses could be selected, to correctly interpret and analyse the data.

Next we can examine the rest of the questionnaire metadata.

Questionnaire name
A meaningful title used to identify the questionnaire from which the questions originate.

Conditions
Rules that determine whether a respondent should answer a question (e.g. “If Q_4 = Yes, go to Q_5”). Also known as skip logic or routing or filtering.

Statements
Additional text contained within the questionnaire (e.g. The next questions will ask about your physical health).

Flow
The flow describes the order and structure of questions and related elements (e.g. instructions, conditions, and statements) in a questionnaire. It shows how respondents move through the survey, including any routing or skip logic. Documenting flow helps preserve how the questionnaire was experienced, which is important because question order and structure can influence responses.

Sub‑universe
The subset of respondents who are eligible to answer a question because of a condition or routing instruction. For example, there may be sections of a questionnaire which are only relevant to those who are retired, so a condition before these would filter out only those who are retired to respond to those questions. Subsetting the universe from 'All persons aged 18 and over', to 'All retired persons aged 18 and over'. Sub‑universes describe who had the opportunity to provide data and are a key part of question metadata because they explain routing, eligibility, and why some respondents never receive certain questions. This is important when considering missing values, which we will explore further in the variables unit.

[!NOTE] We would need a different example which includes routing.

The way questionnaire metadata is created will depend on how it is administered. For example, if you use an online software to deliver a questionnaire, you could use the software to help automate the metadata creation for both the questionnaire and the variables, saving you time and resources. However, if a questionnaire is created as a word document and is administered as a hard copy, you would need to recreate the metadata into a structured form.

[!NOTE] BO - might be helpful to add examples of software that they might be familiar with e.g. Qualtrics, LimeSurvey, SurveyMonkey (not sure how good this is for producing documentation). SW - Unless we know these pieces of software can produce documentation required, I would leave them out

Measurements

Not all data come from written questions. Many research projects involve measured values, such as: height, weight, blood pressure, soil pH, nitrogen content etc.

Measurement-based data collection requires metadata because:

Instruments vary in precision and accuracy
Methods influence measurement error
Conditions (time, location, environment) affect results
Different devices may produce different outputs
Without documentation, values cannot be reliably compared across studies or over time

While questionnaires are widely used across many disciplines, instrument and measurement metadata can be much more domain‑specific. Different fields rely on specialised tools (e.g., laboratory assays, soil probes, medical devices, environmental sensors etc.), which vary in their operating procedures. Because of this, the type of metadata required to document instruments or measurements can vary considerably between domains.

For this reason, it is not possible to define all the metadata elements that may be required for every dataset to be useful. What remains consistent, however, is the need to record enough detail about the method, device, and conditions to allow others and your future self to understand how the measurement was taken and how comparable it is to similar data collected elsewhere. You can do this by asking yourself 'what information do I need to use this data?' and identifying any existing guidelines, standards and schemas which you can follow. For example, the National Geophysical Data Center (NGDC) provides a metadata template for creating and documenting seismic metadata.

Instrument and measurement metadata

Instrument and measurement metadata describes the instrument, the method, and the conditions under which a value was recorded. Key elements include:

Instrument name
e.g. “Hanna HI1292D pH probe”, “Seca Model 213 stadiometer”.

Instrument type
The category of device (e.g. digital thermometer, GPS unit, balance).

Instrument description
Brief explanation of its purpose and characteristics.

Instrument version/model
Important when different versions yield different precision.

Once the instrument itself is documented, we also need metadata that describes the specific measurement it produced, including what was measured, how it was taken, and under what conditions.

Measurement label/name
Identifiers equivalent to question labels.

Measurement description
Explanation of what was measured (e.g. “Soil pH in Plot A”).

Measurement instruction
The protocol used (e.g., “Insert probe 10 cm into soil”).

Measurement method
How the measurement was taken, including any calibration.

Response domain
Usually numeric, but can include text or categories.

Measurement conditions
Time, location, environmental context, and other factors that influence values.

Test your knowledge

Why is it important to document how a variable is measured?

It reduces file size
It ensures consistent naming of datasets
It helps users correctly interpret what the variable represents
It guarantees the data are complete

Reveal answer

Without clear measurement information, users may misunderstand what a variable represents or how it should be used.

Which of the following is an example of question or measurement metadata?

The number of observations in a dataset
The definition of “household income” used in a study
The storage format of the file
The software used to analyse the data

Reveal answer

Definitions of variables (e.g. what is included in “income”) are key parts of measurement metadata.

What is the main risk if question and measurement metadata are missing?

The dataset cannot be opened
Variables may be misinterpreted
The dataset becomes too large
Data cannot be visualised

Reveal answer

Without metadata describing how variables were defined and measured, users may misinterpret results.

Why is it important to document changes in question wording across waves of a study?

It reduces the number of variables
It ensures consistent file formats
It helps identify whether observed changes are real or due to measurement differences
It speeds up data processing

Reveal answer

Changes in wording can affect responses, so documenting them helps distinguish real change from measurement effects.

How do question and measurement metadata support valid comparisons across datasets?

By ensuring identical file formats
By reducing dataset size
By allowing users to assess whether variables are defined and measured in the same way
By removing the need for documentation

Reveal answer

These metadata help users confirm whether variables are comparable across datasets, which is essential for valid analysis.

How do question and measurement metadata interact with unit type and population when comparing datasets?

They are unrelated concepts
They only affect file structure
Together they ensure both what is measured and who it applies to are comparable
They duplicate the same information

Reveal answer

Measurement metadata explains what is being measured, while unit type and population define who or what is being studied, both are needed for valid comparisons.

Summary

In this unit, we explored how questionnaires and measurement instruments shape the data we collect, and why documenting these details is essential for understanding, interpreting, and reusing research data. By capturing metadata about the instrument, mode, instructions, response domains, routing, and the entities involved, we create a transparent record of how each value in a dataset came to be i.e. its provenance. This documentation strengthens data quality, clarifies eligibility and missingness, and provides the provenance needed to link each question or measurement to the variables it produces.

In the next unit, we will look more closely at response domains and more specifically the codes and categories used in response domains.