← ...
cloud infrastructure
cover aws, gcp, azure services, docker, kubernetes, infra as code, ci/cd pipelines.
key concepts
- s3, ec2, lambda, glue, emr
- bigquery, dataflow, data factory
- docker container layers, k8s pods/services
- ci/cd pipelines and automation
- iam, role-based access control
explanation practice
- cloud diagram for airflow pipeline
- docker image layering
- k8s pod lifecycle
projects
1. airflow on aws ec2
- s3 + rds backend orchestration
2. lambda-based etl
- small batch load automation
3. spark job on emr
- read parquet from s3, process
4. github actions ci/cd
- trigger airflow or etl jobs
5. dockerize pipelines
- docker-compose or kubernetes deployment