2025-10-17

blog

compact blog template

publication checklist

cleaning the real world: reproducible etl with tests — mini — hypothesis: tests + docker reproduce clean load; metric: 100% tests pass on fresh env.
from api to analytics: idempotent ingestion patterns — mini — hypothesis: idempotency avoids duplicates; metric: 0 duplicate rows after 10 replays.
designing data lake zones: cost and partitioning trade-offs — intermediate — hypothesis: proper partitioning halves query scan bytes; metric: % scan-bytes reduction.
airflow in production: retries, backfills and testing — intermediate — hypothesis: robust dag reduces manual interventions; metric: increase in successful completion rate after retries.
building a realtime fraud detector: sliding windows and stateful processing — advanced — hypothesis: windowed aggregation detects anomalies within sla; metric: p95 latency and detection precision.
cdc for analytics: schema evolution without downtime — advanced — hypothesis: cdc + idempotent upserts keep analytic view consistent; metric: consistency rate after mixed ops.
feature stores demystified: offline-online consistency — expert — hypothesis: parity checks <1% mismatch; metric: offline-online mismatch %.
starting a data mesh: governance, contracts, and small experiments — expert — hypothesis: domain contracts reduce integration failures; metric: contract-violation detections prevented.

for further endeavour proceed to appendix page