Building Scalable CDC Pipelines with Airbyte and Snowflake
A deep dive into designing Change Data Capture pipelines that process hundreds of GBs daily with automated reconciliation and SCD logic.
Hello, I'm
Building high-throughput, cost-efficient data pipelines across Azure & AWS — transforming raw data into actionable intelligence at scale.
Bridging the gap between raw data and business intelligence
Results-driven Data Engineer with 5+ years of experience designing and optimizing end-to-end data pipelines in Azure and AWS. Skilled in ETL/ELT, data modeling, CDC/SCD, and orchestration using Airflow. Proven track record in building scalable, cost-efficient and secure data workflows, integrating batch and streaming sources like Kafka and relational databases into Snowflake and Delta Lake for analytics.
5+ years designing end-to-end ETL/ELT data pipelines processing hundreds of GBs daily with CDC, SCD, and Medallion Architecture.
Deep experience across Azure (ADLS, ADF, Databricks, Synapse) and AWS (S3, Lambda, Glue, CloudWatch) ecosystems.
Proven track record of optimizing throughput, reducing processing costs by 30%, and achieving 99.9% uptime on production systems.
The tools and platforms I use to build data systems
A track record of building data systems that deliver real business impact
Data Engineer · Bangalore
Senior Software Engineer · Noida
Apprentice Trainee · Chennai
Real-world data engineering solutions with measurable impact
An intelligent resume and portfolio builder powered by AI suggestions, offering multiple professionally designed templates, real-time preview, and PDF export capabilities.
End-to-end Change Data Capture pipeline processing 200GB daily from PostgreSQL to Snowflake. Uses Airbyte for ingestion and Airflow for orchestration, enabling real-time analytics.
High-throughput Java Spring Boot microservice processing 100K+ Kafka messages per hour into Snowflake with 99.9% uptime and 30% cost reduction.
Serverless ETL pipeline for trading data using AWS Lambda, S3, Glue, and Athena with automated CloudWatch triggers for daily extraction.
Large-scale media data lakehouse built on Medallion Architecture processing high-volume datasets using PySpark with data quality checks at each layer.
ADF pipeline ingesting data from multiple sources (APIs, source databases, ADLS storage) to ADLS, converting to Parquet, and transforming in Databricks with event-based triggers and SCD logic.
Data pipeline architectures I've designed and built
End-to-end Change Data Capture pipeline from PostgreSQL to Snowflake with automated orchestration and reconciliation.
High-throughput streaming system processing 100K+ messages per hour from Kafka to Snowflake.
Multi-layer data lakehouse processing high-volume datasets with quality checks at each stage.
Measurable impact through engineering excellence
Staying ahead in the ever-evolving data landscape
Exploring dbt for modern data transformations — building modular, testable SQL pipelines with version control and documentation as first-class citizens.
Experimenting with LLMs and generative AI to build intelligent tools like AI-powered resume builders and automated data documentation systems.
Building data-driven applications that leverage ML models and AI for automated insights, anomaly detection, and smart data quality monitoring.
Sharing knowledge on data engineering, cloud architecture, and best practices
A deep dive into designing Change Data Capture pipelines that process hundreds of GBs daily with automated reconciliation and SCD logic.
Practical insights from building a high-throughput Kafka consumer microservice — batch sizing strategies, error handling, and cost optimization.
How the Bronze-Silver-Gold pattern streamlines data quality, transforms, and governance in modern data lakehouses.
Best practices for structuring Airflow DAGs — dynamic task generation, retry logic, alerting, and integrating with tools like Airbyte.
How we saved 15 hours/month by adopting GitOps with Argo CD — setup, challenges, and lessons learned during the migration.
Real-world techniques for reducing cloud costs — from Snowflake warehouse tuning to batch processing optimization on AWS.
Let's discuss data challenges and opportunities
I'm always open to discussing new opportunities, interesting data problems, or ways to collaborate on scalable data solutions.