Unit 1.1 What is research data?
Unit overview
Unit study time
- 15 minutes
Intended Learning Outcome
By the end of the unit, you will be able to ...
- Understand what data is and how it can be represented
- Identify the characteristics of research data
- Recognise the key components that make up research data (concept → measure → data)
What is data?
In the Introduction course, we'll cover how we can manage and document research data using metadata. Before we do this, it's important we have a strong understanding of what data is, the characteristics of data, and the role of data in research.
We use the term "data" all the time, but what do we actually mean?
We constantly engage with data in our everyday lives. For example, consider a weather app we can find on our phones.
What data is available in the weather app screenshot below[1]?
Data on the weather app
- Temperature
- Precipitation
- Cloud cover
In this screenshot, the data is describing weather conditions in London.
Representation of data
In the weather app, we gain information about the measurement of weather conditions in London by interpreting the data. We interpret different types of data to do this.
For example, we interpret data that is ...
- Numbers e.g. 9
- Text e.g. 'Light showers'
- Symbols e.g. 🌧️
- Date/Time e.g. 19:00
We can also interpret the same information about the same concept using different representations of data. For example, we can gain information about cloud coverage through the following representations of data...
- Text e.g. 'Partially cloudy'
- Symbols e.g. 🌥️
- Number e.g. 4 Oktas (Okta is a scale of measurement used to describe the amount of cloud cover on a scale between 0-8)
Data doesn't come in one set format. Instead, there are different representations of data which can convey the same information. By interpreting these representations, we gain meaning from the data. [2]
Definition of data
How can we define data?
Data has different definitions in different contexts and disciplines. Let's explore some definitions below.
Wikipedia offers a broad understanding of data as ...
"...a collection of discrete or continuous values that convey information describing the quantity, quality, fact, statistics, or other basic units of meaning, or sequences of symbols that may be further interpreted formally" [3]
Taking a more business context, IBM defines data as ...
'...a collection of facts, numbers, words, observations or other useful information. Through data processing and data analysis, organizations transform raw data points into valuable insights that improve decision-making and drive better business outcomes.' [4]
CODATA conceives data as...
'Facts, measurements, recordings, records, or observations about the world, collected by researchers, that are yet to be processed/interpreted/analysed. Data may be in any format or medium taking the form of writings, notes, numbers, symbols, text, images, films, video, sound recordings, pictorial reproductions, drawings, designs or other graphical representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing algorithms, or statistical records.' [5]
While each definition has a slightly different focus, a common theme is that data comes in multiple representations, and it can be interpreted in order to provide information about the topic or object the data is describing.
Research data
Data used for research is called research data. As it is used in a specific context and for specific aims, research data has a narrower definition.
CODATA defines research data as...
'Data that are used as primary sources to support technical or scientific enquiry, research, scholarship, or artistic activity, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results. All other digital and non-digital content have the potential of becoming research data. Research data may be experimental data, observational data, operational data, third-party data, public sector data, monitoring data, processed data, or repurposed data. Therefore, while data in research still conveys information (as the previous definition noted), it is in relation to a specific enquiry or activity.' [6]
While research data can come in lots of different representations, all research data has the same underlying purpose of being used as evidence in a research process.
Research data is collected, observed or measured by an individual or an organisation, with the intention of finding out more about a particular concept. For example, a meteorologist or the Met Office will collect data about the weather in different locations and different times.
Exploring research data
Let's explore research data further.
Research data is the outcome of a process which could be conceived as ...
flowchart LR
sitw(Something in the world) --> Rec(Which is recorded)
Rec --> Rep(Has a representation)
If the 'representation' is the data, what could the first two boxes be? How do we define the 'something in the world' our research looks at? How do we record it?
flowchart LR
Q1( ? ) --> Q2( ? )
Q2 --> D(Data)
Collecting research data
'Something in the world' is the concept we are researching. To record it, we measure the concept and this measurement produces our research data.
flowchart LR
sitw(Something in the world) --> Rec(Which is recorded)
Rec --> Rep(Has a representation)
flowchart LR
C(Concept) --> Me(Measure)
Me --> D(Data)
This process reflects how we collect research data. First, we generate a hypothesis on a particular concept (something in the world), then we measure the concept to test the hypothesis (which is recorded), and finally we produce data from our measurements (has a representation). The concept you study influences the measurement you choose, and the measurement determines the data you collect.
Practice activity
If you're a researcher who wants to record the age of every participant in a study, what is the concept → measure → data?
What is the concept we want to capture?
Age of person.
How do we measure age?
The way we conduct this measurement will be our data collection method. We may measure age by asking research participants the question 'What is your age?' in a questionnaire. A questionnaire would be our measurement tool which can be referred to as our data collection method.
How would we represent that data?
We could use years, months, days, minutes or seconds to record data about age. If we used one of these units of measurement, we would create numeric data. If we use years as a unit of measurement, the valid range of numbers that would be acceptable would start from 0 (as your age cannot be negative) to roughly 115, based on the average lifespan of a person. The numeric data collected in our research will make up our dataset.
A person's age as research data: concept → measure → data
flowchart LR
aop(Age of person <br> <i>Concept</i>) --> Ye(Years <br> <i>Measure</i>)
Ye --> yd(0...115 <br> <i>Data</i>)
Thinking back to the weather app, identify the concept → measure → data for the data collected on the app.
flowchart LR
C(Concept) --> Me(Measure)
Me --> D(Data)
flowchart LR
B1( ) --> B2( )
B2 --> B3( )
Temperature
You may have identified the concept → measure → data for the temperature in London.
flowchart LR
tol(Temperature of London) --> de(Thermometer)
de --> td("-20...45 degrees")
Wind gusts
You may have identified the concept → measure → data for wind gusts in London.
flowchart LR
wgl(Wind gusts in London) --> mph(Anemometer)
mph --> mphd("0...100 miles per hour")
Practice activity
Let's try another example in a research context.
You are doing research into young people's relationship with social media. As part of this research, you need to find out how long each participant spends on a social media platform per day. What is concept → measure → data for this research?
flowchart LR
B1( ) --> B2( )
B2 --> B3( )
Concept → measure → data flow
One answer could be...
flowchart LR
tsosm(Time Spent on Social Media) --> min(Survey question)
min --> d(0...24 hours)
Research data in your work
If you're currently doing research, try to identify the concept → measure → data for your work.
flowchart LR
B1( ) --> B2( )
B2 --> B3( )
Test your knowledge
- Complete the missing step: Concept → ??? → data
- True or false: different data using different units of measurement can convey the same information
- What role does data have in research?
Answer
- Complete the missing step: Concept → measure → data
- Different data using different units of measurement can convey the same information TRUE
- Data is used as evidence in the research process and supports and validates research findings.
Further learning
If you'd like to learn more about research data, you can explore these training modules...
- MANTRA research data in context provides an open online training module on 'Research Data in Context'
- Queen Mary University provides an open training module on 'Research Data Explained'
References
- [1] Met Office (2026) Met Office Weather App [Mobile app]. Available at: App Store or Google Play Store https://www.metoffice.gov.uk/
- [2] University of Cambridge (2025) What is research data?
- [3] Wikipedia (2025) Data
- [4] IBM (2025) What is Data?
- [5] CODATA (2025) RDM Terminology: Data
- [6] CODATA (2025) RDM Terminology: Research data