← ...

stage 5

data engineering core projects

goal: pipeline thinking.

very easy — pipeline awareness

  • csv cleaner with logging
  • daily aggregation script
  • json to csv transformer
  • log parser
  • data quality checker

focus: validation mindset.

easy — structured pipelines

  • simple etl (csv → transform → sqlite)
  • scheduled ingestion (process new files only)
  • api ingestion with retry
  • bronze/silver/gold folder structure
  • streaming simulator (rolling summary)

focus: repeatability.

intermediate — reliability thinking

  • incremental load with watermark
  • change data capture simulation
  • parquet-based mini lakehouse
  • anomaly detection pipeline
  • airflow dag with dependencies

focus: state + orchestration.

moderate hard — production mindset

  • event-driven pipeline (producer → queue → consumer)
  • schema evolution handling
  • idempotent batch job
  • analytics api serving layer
  • full mini platform (ingest → transform → store → serve → log)

focus: robustness.