Back to PortfolioArchitecture

Medallion Architecture: From Raw to Analytics-Ready in Three Layers

Jan 20267 min read

What Is the Medallion Architecture?

The Medallion Architecture — also known as the multi-hop architecture — organizes data in a lakehouse into three progressive layers: Bronze (raw), Silver (cleansed), and Gold (business-ready). Each layer adds incremental quality, structure, and governance. Think of it as a refining process: raw ore enters the Bronze layer, gets purified in Silver, and emerges as polished analytics-ready datasets in Gold.

Bronze Layer: Raw Ingestion

The Bronze layer is your system of record — an append-only, schema-on-read landing zone where data arrives in its original format. We store everything as Delta tables with full audit columns: ingestion timestamp, source system identifier, and a raw payload column. No transformations happen here. The goal is to preserve the exact data as it arrived, enabling full reprocessing from scratch if business logic changes downstream. We typically retain Bronze data for 90 days.

Silver Layer: Cleansing and Conforming

The Silver layer applies data quality rules, deduplication, type casting, and schema enforcement. This is where we join related tables, resolve foreign keys, and standardize column naming conventions. We use PySpark with Delta Lake's MERGE operation to handle upserts efficiently. Key data quality checks include: null checks on required fields, referential integrity validation, date range validation, and duplicate detection using composite keys. Rows that fail validation are routed to a quarantine table for investigation.

Gold Layer: Business Aggregation

The Gold layer contains pre-aggregated, denormalized datasets tailored to specific business use cases — dashboards, ML features, or API responses. Each Gold table has a clear business owner and a documented SLA for freshness. We avoid "one Gold table to rule them all" anti-patterns; instead, each domain team defines their own Gold tables based on their analytical needs. This reduces contention and makes ownership crystal clear.

Data Quality at Every Layer

Quality gates between layers are the backbone of the Medallion Architecture. We implement expectations (similar to Great Expectations) at each transition. Bronze → Silver checks focus on schema conformance and basic validity. Silver → Gold checks focus on business rules and aggregation correctness. Every failed check logs to a centralized data quality dashboard and triggers an alert. This layered approach means a bad file in Bronze never silently corrupts a Gold report.

When to Use (and When Not to)

The Medallion Architecture excels when you have diverse data sources, need audit trails, and serve multiple downstream consumers. It adds overhead that may not be justified for small, single-source pipelines or exploratory analytics. If your entire data estate fits in a single PostgreSQL database, you probably don't need three layers. But the moment you have 5+ sources feeding 10+ dashboards, the structure pays for itself in reduced debugging time and faster onboarding of new engineers.

💡Key Takeaways

  • 1.Bronze = raw and immutable; Silver = cleansed and conformed; Gold = business-ready aggregations.
  • 2.Each layer adds incremental quality — never skip layers by going straight from Bronze to Gold.
  • 3.Delta Lake's MERGE operation makes upserts efficient and ACID-compliant in the Silver layer.
  • 4.Assign clear business ownership to Gold tables to avoid "shared everything, owned by nobody" anti-patterns.
  • 5.Implement data quality gates at every layer transition to prevent bad data from propagating downstream.