Skip to content

Unit 2.6 Concept metadata

Overview

Unit study time

  • 20 minutes

    Intended Learning Outcome By the end of the unit, you will be able to ...

    • Define what a concept is in the context of research metadata.
    • Explain how concept metadata supports semantic interoperability.

What is a concept?

Concepts are a fundamental component of collecting data. Think back to the Concept -> Measure -> Data flow in unit 1.1 of the Introduction to Metadata Course. Concepts form the semantic foundation of research data - they describe what is being measured.

But what is a concept?

Concepts are a unit of meaning. They are an idea with a formal definition and a name, and they consist of a unique combination of characteristics.

For example...

  • Person
  • Apple
  • Lion
  • Gender
  • Marital status
  • Employment status
  • Satisfaction with healthcare
  • Academic discipline

If concepts are a unit of meaning, how do we define a unit?

For example ...

What is a lion?

[!NOTE] Add reference as this looks like it was taken from DDI training slides

We define concepts through their characteristics. It's through these characteristics that we can differentiate between different concepts.

For example...

Tiger Lion
Large feline Large feline
Coarse fur Coarse fur
Orange and black colored Tawny colored
Lives in groups called streaks Lives in groups called prides
Lives in East and South Asia Lives in Africa or North West India
Stripes No stripes

Concepts can be broad, e.g. big cats, or they can be narrow, e.g. Lion. In this way, concepts related to other concepts are considered sub-concepts of a broader concept.

For example, the concepts ...

  • Lion
  • Tiger
  • Panther
  • Jaguar

Can be considered sub-concepts of big cats when grouped together.

Concept metadata

Where would you include concept information in your metadata?

As all research aims to measure a concept. You can create concept metadata at a broad level (e.g. in your project or dataset metadata) and/or at a granular level (e.g. in your variable, question and category metadata).

Variable, question and category metadata - A question, variable, or group of variables are designed to measure and describe a particular concept. For example, the question 'What is your marital status?' is designed to capture the concept 'marital status' of a person. This means you can add concept metadata to question and variable metadata. In your codelist metadata, you can list the concept for each category.

Dataset and project metadata - As concepts can also be broad, at a higher level, you can capture concept data for a research project as a whole or for each dataset. This will provide the main topic that the research or dataset relates to.

Concept metadata and interoperability

Let's unpack how concept metadata makes different metadata more interoperable.

Codelists

Across different datasets, you might have different codelists to capture information about the same concept.

Dataset 1 Dataset 2 Dataset 3 Dataset 4
S Single 1 Never married
M Married 2 Married
D Divorced 3 Divorced
W Widowed 4 Widowed
5 Separated
Engaged

While each codelist has different categories (indicating different sub-concepts for marital status), the concept metadata tells us that they all relate to marital status. So even when data is collected in different ways, we can identify when data might be comparable or related to each other. It can also help us understand how a particular concept is defined, and whether that definition has evolved over time or changes in different contexts.

Variables

Variables across different datasets and research projects may measure the same concept. However, as they have different labels and names we wouldn't necessarily know that they are comparable unless we explored each dataset in-depth. This would be a time consuming process and we would most likely miss variables that could be comparable.

For example, the variables...

  • Dataset 1: MS_22
  • Dataset 2: prtnr
  • Dataset 3: signif_othr
  • Dataset 4: relationshipstat

all measure marital status. If the overall project or dataset was not focused on marital status (e.g. Dataset 1 could be collecting data about mental health in adults whereas Dataset 4 could be collecting data about access to adult education services), we wouldn't necessarily know that there are variables within those datasets that we could use and compare for further research.

Therefore, by stating the concept of each variable, we provide a quick and easy way to understand what data is included in a project, even if it is not the main focus of the project.

Questions

  • Are you currently married?
  • What is your marital status?
  • Are you in a legal partnership?
  • What is your legal partnership status?

Even though the representation is different, the concept that the data measures, marital status, remains the same.

Each of the categories that the codes represent in a codelist are also a concept in themselves, for example...

  • Married
  • Divorced
  • Single
  • Widowed
  • Separated
  • Engaged

This means you can have broader concepts which have fewer characteristics defining them, and narrower concepts that have more characteristics.

For example, martial status refers to a person's situation with regard to whether one is single, married, separated, divorced, or widowed.[^1]

Whereas, the concept 'married' refers specifically to people who are united in marriage and have signed a legal document confirming their union.

[^1]: Oxford Dictionary

The meaning for a concept can also change over time. For example, over the past 20 years, the definition of marital status has evolved to include situations such civil partnerships or separated but still married. Take a look at the ONS to see the categories they include as part of their legal partnership code list

What are the benefits of documenting concept metadata?

Documenting concept metadata provides clear definitions of what a variable means, not just what it is called. It is fundamental for making research data interoperable, comparable, and reusable across projects, time periods, and contexts. While humans can often infer meaning from experience or context, machines cannot, and even humans frequently interpret terms differently across disciplines, cultures, and eras. Concept metadata bridges these gaps.

Benefits of concept metadata

  • Enables re-use and cross‑Study Comparison

Even if datasets from different projects used different data collection methods or were conducted in a different time and/or place, there could still be comparable data across them. Well‑defined concept metadata makes it possible to recognise when two variables represent the same underlying idea. This enables space for new research and findings, beyond the parameters of the original research projects. As such, data goes further and supports further investigations.

  • Reduces ambiguity and misinterpretation

If concepts along with their definitions are documented, we can ensure that we understand what the measure is, and not misinterpret the data. Assumptions could be made as concepts can seem 'obvious', but as demonstrated above, interpretation can vary based on cultural context, social constructs and local conventions for example.

  • Supports machine actionable metadata and interoperability

Machines cannot infer meaning from context, they need explicit definitions and relationships. This is essential for automation and more efficient data management processes. This is also vital for semantic interoperability.

  • Improves data quality and consistency within projects

Capturing concept metadata can reduce internal inconsistencies and improve clarity, help researchers select appropriate measures, prevent duplication of similar variables. It also becomes easier to maintain conceptual coherence over time, especially if the meaning of a concept changes over time, but the name remains the same.

Concept metadata defines what data represents, enabling clearer interpretation, comparison, and reuse across studies.