What Is the Medallion Architecture?
The Medallion Architecture — also known as the multi-hop architecture — organizes data in a lakehouse into three progressive layers: Bronze (raw), Silver (cleansed), and Gold (business-ready). Each layer adds incremental quality, structure, and governance. Think of it as a refining process: raw ore enters the Bronze layer, gets purified in Silver, and emerges as polished analytics-ready datasets in Gold.
Bronze Layer: Raw Ingestion
The Bronze layer is your system of record — an append-only, schema-on-read landing zone where data arrives in its original format. We store everything as Delta tables with full audit columns: ingestion timestamp, source system identifier, and a raw payload column. No transformations happen here. The goal is to preserve the exact data as it arrived, enabling full reprocessing from scratch if business logic changes downstream. We typically retain Bronze data for 90 days.
Silver Layer: Cleansing and Conforming
The Silver layer applies data quality rules, deduplication, type casting, and schema enforcement. This is where we join related tables, resolve foreign keys, and standardize column naming conventions. We use PySpark with Delta Lake's MERGE operation to handle upserts efficiently. Key data quality checks include: null checks on required fields, referential integrity validation, date range validation, and duplicate detection using composite keys. Rows that fail validation are routed to a quarantine table for investigation.
Gold Layer: Business Aggregation
The Gold layer contains pre-aggregated, denormalized datasets tailored to specific business use cases — dashboards, ML features, or API responses. Each Gold table has a clear business owner and a documented SLA for freshness. We avoid "one Gold table to rule them all" anti-patterns; instead, each domain team defines their own Gold tables based on their analytical needs. This reduces contention and makes ownership crystal clear.
Data Quality at Every Layer
Quality gates between layers are the backbone of the Medallion Architecture. We implement expectations (similar to Great Expectations) at each transition. Bronze → Silver checks focus on schema conformance and basic validity. Silver → Gold checks focus on business rules and aggregation correctness. Every failed check logs to a centralized data quality dashboard and triggers an alert. This layered approach means a bad file in Bronze never silently corrupts a Gold report.
When to Use (and When Not to)
The Medallion Architecture excels when you have diverse data sources, need audit trails, and serve multiple downstream consumers. It adds overhead that may not be justified for small, single-source pipelines or exploratory analytics. If your entire data estate fits in a single PostgreSQL database, you probably don't need three layers. But the moment you have 5+ sources feeding 10+ dashboards, the structure pays for itself in reduced debugging time and faster onboarding of new engineers.