Data Product Architecture: The 3-Layer Design Explained

“Data is an asset” has become a familiar phrase, but treating data like an asset inside an organization is harder than it sounds. Data scatters across systems, the same field name carries different meanings in different teams, and nobody is quite sure who owns what. In an era where AI and machine learning sit at the center of most products, the quality of your data now determines the quality of your output.

To work with data well, an organization needs a shared language for how data is structured. This article walks through the foundation of that language: a three-layer model of data elements, data domains, and data products that forms the backbone of any usable data product architecture. The layers move from the smallest unit of meaning (element) to a managed grouping (domain) to a finished, reusable asset (product). Along the way, we cover the nine quality criteria for evaluating data, the limits of legacy pipelines, and how to identify which data deserves to become a product.

This piece is the first in a two-part series on data products and architecture. The second article covers the technology foundation — data lakes, lakehouses, data mesh, data fabric — that this design layer sits on top of.

Contents hide

1. The 3-Layer Data Design Model: Elements, Domains, and Products

2. Layer 1: Data Elements as the Foundation

2.1. Evaluating Data Quality: The Process and 9 Criteria

2.2. Tools for Sourcing and Evaluating Data

3. Layer 2: Data Domains for Ownership and Policy

4. Building a Data Roadmap from Current State to Target

5. Layer 3: Data Products as Reusable Assets

5.1. The Limits of Legacy Data Structures

5.2. The Data Product Approach

6. Consumption Archetypes and How to Identify Data Product Candidates

6.1. How to Identify Data Product Candidates

6.2. A Data Product Is Not “More Is Better.”

6.3. Developing the Data Product

7. Conclusion

The 3-Layer Data Design Model: Elements, Domains, and Products

Three-layer structure connecting data elements domains and products

Designing data well means answering a small set of questions before any pipeline is built.

What use cases will the data serve?
What data is already available?
What does the organization need to start tracking?
What risks does this data introduce, and what regulations apply to it?

Once those answers are clear, the data has to be organized. Three concepts do most of the structural work:

Data element: the smallest unit of information that carries its own meaning. Examples: customer name, customer address, product name, transaction date.
Data domain: a conceptual grouping of related elements. Domains are the unit used in data governance and data architecture work.
Data product: a high-quality, ready-to-use data set that anyone in the organization can pull off the shelf. A data product is usually built from a subset of one or more domains.

A simple analogy: elements are ingredients, domains are the drawers you organize ingredients into, and products are the finished dishes you cook from several drawers. Each layer is covered in the sections that follow, starting with the element.

Layer 1: Data Elements as the Foundation

A data element is the smallest unit of information that still means something on its own. “Customer name,” “transaction amount,” and “signup date” are elements. Break them down further and the meaning disappears. In a bank, the element list might include customer name, account number, transaction amount, transaction timestamp, and branch code. Each value is small, but each one is the atom that every higher layer is built from.

This matters because domains and products are nothing more than collections of elements. If something is wrong at the element level, every structure stacked on top of it inherits the problem. If “customer name” is stored as “Hong Gildong” in one system, “Hong Gil-dong” in another, and “HONG, GIL DONG” in a third, the customer domain and the customer data product built from those three sources are unreliable from day one.

Two questions matter most when handling a data element:

Definition: what exactly does this element mean? A “signup date” is ambiguous unless the team agrees whether it refers to the moment of account registration, the first transaction, or contract execution.
Quality: how accurately and consistently is the agreed definition reflected in the stored values?

Definitions are settled by organizational agreement. Quality requires a separate framework and a structured evaluation process.

Evaluating Data Quality: The Process and 9 Criteria

If the data you need does not exist, or exists but is low quality, no amount of solution design will save you. The AI-era cliche “garbage in, garbage out” applies literally. Before any preprocessing or use, the data has to be assessed.

Evaluation moves through three steps:

Define data quality rules based on current and expected business needs.
Set a target value for each rule that satisfies the business need.
Measure the data and report performance against the rules.

The nine criteria below give you the measurement axes: accuracy, timeliness, consistency, completeness, uniqueness, coherence (whether the definition holds up over time), availability, security, and interpretability. Each criterion produces a score, and the scores together tell you whether the element is ready to be promoted into a domain or product.

Tools for Sourcing and Evaluating Data

When the volume of data is large, machine learning and AI tools like Talend, Trillium Quality, Pherlink, and Syncsort make the cleansing process more efficient. Another practical approach is to build the data at a minimum viable level first, then improve quality and coverage iteratively through data enrichment. Enrichment is usually cheaper than trying to get every element perfect on day one.

Layer 2: Data Domains for Ownership and Policy

Elements multiply quickly. Once an organization has thousands of them, managing one at a time stops working. Who owns which field? Which policies apply to which data? The answers tangle within weeks. The domain solves this by grouping related elements into a single unit of meaning. A data domain is a conceptual grouping of elements that belong to the same business subject.

In a bank, typical domains look like this:

Customer domain: customer ID, name, date of birth, address, contact, signup date
Account domain: account number, account type, opening date, balance, status
Transaction domain: transaction ID, timestamp, amount, type, counterparty account
Product domain: product code, product name, interest rate, conditions, sale period

The grouping is based on business meaning, not on technical storage. Customer information might live in a CRM, a loan system, and a mobile app, but from a domain perspective it is one customer domain.

Three things become clearer once data is viewed at the domain level:

Ownership: each domain has a responsible team and a domain owner. The customer domain belongs to the customer management team; the transaction domain belongs to retail banking. Without a single owner, scattered customer information has nowhere for definition conflicts or quality issues to land.
Policy scope: rules for security, retention, access, and quality apply at the domain level. Encrypting all personal data and logging access across the customer domain is far more efficient than attaching a separate policy to each element.
Material for data products: data products are usually built from one full domain or a combination of several. The better the domains are defined, the easier product design becomes.

A domain is the unit where you set owners and access rules. That structure is what makes the next layer — the data product — possible.

Building a Data Roadmap from Current State to Target

Once data quality has been assessed, the next move is a data roadmap: a plan for how the current state evolves toward the structure the organization needs. Just as digital solutions evolve to meet customer needs, the data underneath has to evolve in form and resourcing to support that value.

A data roadmap usually moves through three stages:

Define data operations for the present: acquire the priority data elements and decide how the organization will manage and use them.
Develop data pipelines and architecture: define the pipelines and architecture each domain needs, and plan for the domains and elements that will be added later.
Build data governance: establish the discipline that keeps existing data correct and ensures new data is collected properly. Governance is what makes the structure sustainable.

The roadmap is the bridge between assessing data and treating it as a product.

Layer 3: Data Products as Reusable Assets

Data should be managed like a product. Inside an organization, data is consumed by business decisions, research, and external customers — and each consumer needs the data in a specific shape, on a specific schedule, with specific quality guarantees. Treating data as a product forces the team to ask who the users are and what they need, the same way any product team would.

A data product is a “finished good” assembled from one domain or a combination of domains, packaged so that anyone in the organization can pull it off the shelf. A “Customer 360 data product,” for example, integrates basic information from the customer domain, recent transaction patterns from the transaction domain, and product holdings from the product domain — already joined, already quality-checked. Marketing uses it for campaigns, risk uses it for credit scoring, and the app team uses it for personalization. The core difference is that no team has to pull raw data and reshape it on their own.

The Limits of Legacy Data Structures

Fragmented pipelines contrasted with a shared reusable data product layer

Most legacy data setups suffer from the same shape. Pipelines are fragmented, duplicated, and expensive to maintain. The diagram below shows the pattern.

[Legacy data structure:
 a separate pipeline for each use case]

System of record ──▶ Data platform
                          │
    ┌─────────┬───────────┼───────────┐
    ▼         ▼           ▼           ▼         
Dedicated  Dedicated  Dedicated  Dedicated
dataset A  dataset B  dataset C  dataset D
    │         │           │           │
    ▼         ▼           ▼           ▼
Dedicated  Dedicated  Dedicated  Dedicated
 tech A     tech B     tech C     tech D
    │         │           │           │
    ▼         ▼           ▼           ▼
Use case A Use case B Use case C Use case D

The core problem in this diagram is that a single system of record spawns N fragmented pipelines, and the work done in one pipeline cannot be shared with another. It looks natural at first glance, but the duplication is structural.

Consider a bank where the digital banking app, a cross-sell prediction model, and the financial reporting team all need customer data. Even though they need the same underlying records, each builds its own pipeline against its own standards, producing datasets in different formats and at different quality levels. Three inefficiencies pile up:

Duplicated work: the same source data is reprocessed for each use case, multiplying the same effort across the organization.
Higher cost: each use case carries its own technology stack, multiplying licensing, infrastructure, and operations cost.
Quality gaps: the same “customer ID” is defined differently and governed differently in each pipeline, so outputs disagree with each other.

The data product approach is the response to this accumulated inefficiency.

The Data Product Approach

A data product is a packaged data asset that digital applications and internal business systems can pull from for their specific purposes. The data product approach adds two new ideas to the legacy structure: a shared product layer between platform and use case, and a typology of how that product gets consumed.

[Data product approach:
 a shared, standardized asset
 serving many use cases]

System of record ──▶ Data platform
                          │
                          ▼
  ┌───────────────────────────────────┐
  │         Data product              │
  │ ┌───────────────────────────────┐ │
  │ │ Domain A (e.g., Customer)     │ │
  │ │ Domain B (e.g., Transac.)     │ │
  │ │ Domain C (e.g., Product)      │ │
  │ │ Domain D ...                  │ │
  │ └───────────────────────────────┘ │
  └────────────────┬──────────────────┘
                   │
    ┌──────────┬───┼───────┬──────────┐
    ▼          ▼   ▼       ▼          
Archetype 1 Archetype 2 Archetype 3 Archetype 4
(Digital    (Advanced   (Reporting) (External
 app)        analytics)              data sharing)
    │          │           │          │
    ▼          ▼           ▼          ▼
Use case A Use case B Use case C Use case D

The structural change is that a shared product layer now sits between the data platform and the use cases. Two consequences follow. First, a domain that has been organized once is reused across many use cases, so duplicated processing disappears. Second, a consumption archetype layer sits between the product and the use case, formalizing the question “what kind of consumption is this data for?”

A real-time digital app, a machine learning model, and a quarterly report all need different shapes of data, and the archetype is the contract that names that shape.

Consumption Archetypes and How to Identify Data Product Candidates

When you design a data product, you have to decide upfront who will use it and how. The same customer data product behaves very differently when it is called from a real-time app, used in a quarterly report, or fed as training data into a machine learning model. The required structure, refresh cadence, quality bar, and governance level all change with the use.

To manage that variation, McKinsey classifies data product consumption into five archetypes. An archetype is not just a list of uses. It is a template that formalizes the data structure and operational requirements for each kind of consumption. Decide the archetype first, and the level of processing and governance the product needs becomes clear.

How to Identify Data Product Candidates

Not every data element needs to become a data product. Productization makes sense only when the use case is repeated enough to justify the investment and the business value is high enough to warrant ongoing operation. Data used in a single isolated case can stay inside that case’s pipeline.

Three criteria separate a product candidate from a non-candidate:

Reuse frequency
Business impact
Management difficulty

These are the large axes. In practice, two specific questions surface the answer:

Are there different use cases that repeatedly need this data?
Are new use cases for this data likely to appear in the next six months?

If the answer to either is yes, the data is a product candidate. If not, leaving it in a use-case pipeline is the reasonable choice. Data that was not a candidate at first can always be promoted to a product later, once the demand shows up.

A Data Product Is Not “More Is Better.”

It is a strategically chosen asset that the organization commits to operating well. Turning every dataset into a product spreads engineering and governance resources thin, and the products that actually matter lose quality and operational maturity as a result.

Building a data product takes meaningful time and money. Before committing, the team should have a clear view of the data need, the use cases, the reuse potential, and the business value. Four questions help test whether the investment is justified:

Is this data product a unique asset of our organization? — If not, check whether it can be sourced externally at a reasonable cost instead of being built internally.
Will it be useful to the people and systems that end up using it? — Confirm that real users inside or outside the organization will use it repeatedly and meaningfully.
What does “good data” mean here? — Sometimes freshness matters most; sometimes a smaller, higher-quality, more specific dataset wins. The quality bar depends on the use.
How many use cases will it support, and what is each one worth? — A data product should raise the utility of the underlying data by serving multiple use cases well.

Developing the Data Product

Data product development breaks into three areas and six stages, moving from definition through build and into operation. The full table of stages and deliverables is the topic of a follow-up walkthrough, but the structure follows the same logic as any product development cycle: define the user and use case, design the structure, build, validate, deploy, and operate.

Conclusion

The core shift is to stop treating data as something to manage and start treating it as an asset to design like a product. That design starts at the smallest unit — the element — moves through the meaningful grouping — the domain — and arrives at the reusable product.

Each layer carries its own responsibilities: elements demand definition and quality, domains demand ownership and policy, products demand a clear consumer and a clear archetype.

The three-layer model answers the question of what to manage. The next question is what technology foundation to build it on. Whether to use a data lake, a lakehouse, or a data mesh depends on organizational size, maturity, and the consumption archetypes you have committed to.

The next article in this series, Data Architecture and Data Capabilities, covers that decision and the data capabilities an organization needs to make the design work in production.

Data Products and Architecture Series

(1) Data Product Architecture: The 3-Layer Design of Elements, Domains, and Products

(2) Data Architecture Explained: 5 Archetypes and the Capability Map Behind Them

Data Product Architecture: The 3-Layer Design of Elements, Domains, and Products