Andorra United Arab Emirates Afghanistan Antigua and Barbuda Albania Armenia Angola Argentina Austria Australia Azerbaijan Bosnia and Herzegovina Barbados Bangladesh Belgium Burkina Faso Bulgaria Bahrain Burundi Benin Brunei Darussalam Bolivia (Plurinational State of) Brazil Bahamas Bhutan Botswana Belarus Belize Canada Congo, Democratic Republic of the Central African Republic Congo Switzerland C??te d'Ivoire Chile Cameroon China Colombia Costa Rica Cuba Cabo Verde Cyprus Czechia Germany Djibouti Denmark Dominica Dominican Republic Algeria Ecuador Estonia Egypt Eritrea Spain Ethiopia Finland Fiji Micronesia (Federated States of) France Gabon United Kingdom Grenada Georgia Ghana Gambia Guinea Equatorial Guinea Greece Guatemala Guinea-Bissau Guyana Honduras Croatia Haiti Hungary Indonesia Ireland Israel India Iraq Iran (Islamic Republic of) Iceland Italy Jamaica Jordan Japan Kenya Kyrgyzstan Cambodia Kiribati Comoros Saint Kitts and Nevis Korea (Democratic People's Republic of) Korea, Republic of Kuwait Kazakhstan Lao People's Democratic Republic Lebanon Saint Lucia Liechtenstein Sri Lanka Liberia Lesotho Lithuania Luxembourg Latvia Libya Morocco Monaco Moldova, Republic of Montenegro Madagascar Marshall Islands North Macedonia Mali Myanmar Mongolia Mauritania Malta Mauritius Maldives Malawi Mexico Malaysia Mozambique Namibia Niger Nigeria Nicaragua Netherlands Norway Nepal Nauru New Zealand Oman Panama Peru Papua New Guinea Philippines Pakistan Poland Portugal Palau Paraguay Qatar Romania Serbia Russian Federation Rwanda Saudi Arabia Solomon Islands Seychelles Sudan Sweden Singapore Slovenia Slovakia Sierra Leone San Marino Senegal Somalia Suriname South Sudan Sao Tome and Principe El Salvador Syrian Arab Republic Eswatini Chad Togo Thailand Tajikistan Timor-Leste Turkmenistan Tunisia Tonga T�����rkiye Trinidad and Tobago Tuvalu Taiwan (Province of China) Tanzania, United Republic of Ukraine Uganda United States of America Uruguay Uzbekistan Holy See Saint Vincent and the Grenadines Venezuela (Bolivarian Republic of) Viet Nam Vanuatu Yemen South Africa Zambia Zimbabwe
  • training@skillsforafrica.org
    info@skillsforafrica.org

Data Engineering With Apache Spark And Airflow Training Course in Kenya

Introduction

The Data Engineering with Apache Spark and Airflow Training Course offers a future-ready curriculum designed for professionals looking to master modern big data processing, orchestration, and automation. As businesses generate massive volumes of structured and unstructured data, the need for scalable data pipelines and real-time analytics has surged. This training empowers participants with hands-on skills in building robust, scalable, and fault-tolerant data engineering workflows using industry-standard tools like Apache Spark and Apache Airflow.

In this comprehensive course, learners will explore distributed computing with Spark, implement ETL processes, and schedule complex data workflows with Airflow. Through real-world use cases and guided projects, participants will gain proficiency in data ingestion, transformation, optimization, and scheduling. Whether you're transitioning into a data engineering role or upgrading your skills in advanced data platforms, this course provides a powerful foundation to design and manage modern data infrastructure.

Target Audience

  • Data Engineers and Data Scientists
  • Business Intelligence Analysts
  • Software Engineers transitioning to data roles
  • Cloud Engineers and DevOps professionals
  • ETL Developers and Data Architects
  • AI/ML Engineers needing reliable data pipelines
  • Technical Project Managers in data-driven projects

Course Objectives

  • Understand the architecture and components of Apache Spark and Airflow
  • Build scalable ETL pipelines using PySpark and Spark SQL
  • Automate workflow scheduling and monitoring with Airflow DAGs
  • Integrate data sources from APIs, databases, and cloud platforms
  • Design fault-tolerant and efficient batch and stream processing jobs
  • Implement data quality checks and error handling in pipelines
  • Schedule dynamic, dependency-based data tasks in Airflow
  • Optimize Spark jobs for memory, speed, and parallel execution
  • Deploy data pipelines in production environments
  • Monitor, debug, and document end-to-end data workflows

Duration

10 Days

Course content

Module 1: Foundations of Data Engineering

  • Overview of modern data architecture
  • Batch vs streaming pipelines
  • Data engineering lifecycle
  • Roles and responsibilities
  • Key tools and frameworks in the ecosystem

Module 2: Apache Spark Core Concepts

  • Spark architecture and RDDs
  • Transformations and actions in Spark
  • SparkSession and lazy evaluation
  • Deploying Spark in local and cluster modes
  • DataFrame operations in PySpark

Module 3: Data Transformation with Spark SQL

  • Structured data processing with DataFrames
  • Writing SQL queries on Spark tables
  • Joins, filters, and aggregations
  • Schema inference and data types
  • Writing to and reading from Parquet, JSON, and CSV

Module 4: Advanced Spark Optimization Techniques

  • Catalyst optimizer and Tungsten engine
  • Partitioning and shuffling strategies
  • Broadcast joins and caching
  • Tuning memory and execution settings
  • Monitoring Spark applications

Module 5: Introduction to Apache Airflow

  • Airflow architecture and components
  • DAGs, operators, and task scheduling
  • Installing and configuring Airflow
  • Airflow UI and command-line interface
  • Understanding task dependencies

Module 6: Building ETL Pipelines with Airflow

  • PythonOperators, BashOperators, and custom plugins
  • Connecting to PostgreSQL, S3, BigQuery, and APIs
  • Handling retries, alerts, and SLAs
  • Dynamic DAG generation
  • Managing secrets and connections

Module 7: Data Quality and Validation

  • Adding data checks in Spark
  • Custom Airflow sensors and validators
  • Ensuring schema consistency
  • Error tracking and reporting
  • Best practices for reliable data ingestion

Module 8: Workflow Monitoring and Logging

  • Enabling Airflow logs and metrics
  • Using Grafana and Prometheus
  • Setting up alerting systems
  • Auditing historical DAG runs
  • Debugging failed tasks

Module 9: Stream Processing with Spark Structured Streaming

  • Streaming APIs in PySpark
  • Windowing and watermarking
  • Reading from Kafka and filesystems
  • Writing streaming outputs
  • Handling late data and checkpointing

Module 10: Deploying Data Pipelines

  • Containerizing Airflow with Docker
  • Running Spark on Kubernetes or EMR
  • CI/CD for data workflows
  • Airflow in production (Astronomer, Cloud Composer)
  • Handling scale and concurrency

Module 11: Integrating Machine Learning Workflows

  • Using Spark MLlib for model training
  • Scheduling model runs with Airflow
  • Logging and versioning ML experiments
  • Passing models through pipelines
  • Serving models from batch predictions

Training Approach

This course will be delivered by our skilled trainers who have vast knowledge and experience as expert professionals in the fields. The course is taught in English and through a mix of theory, practical activities, group discussion and case studies. Course manuals and additional training materials will be provided to the participants upon completion of the training.

Tailor-Made Course

This course can also be tailor-made to meet organization requirement. For further inquiries, please contact us on: Email: info@skillsforafrica.org, training@skillsforafrica.org Tel: +254 702 249 449

Training Venue

The training will be held at our Skills for Africa Training Institute Training Centre. We also offer training for a group at requested location all over the world. The course fee covers the course tuition, training materials, two break refreshments, and buffet lunch.

Visa application, travel expenses, airport transfers, dinners, accommodation, insurance, and other personal expenses are catered by the participant

Certification

Participants will be issued with Skills for Africa Training Institute certificate upon completion of this course.

Airport Pickup and Accommodation

Airport pickup and accommodation is arranged upon request. For booking contact our Training Coordinator through Email: info@skillsforafrica.org, training@skillsforafrica.org Tel: +254 702 249 449

Terms of Payment: Unless otherwise agreed between the two parties’ payment of the course fee should be done 10 working days before commencement of the training.

Course Schedule
Dates Fees Location Apply
07/07/2025 - 18/07/2025 $3000 Nairobi, Kenya
14/07/2025 - 25/07/2025 $5500 Johannesburg, South Africa
14/07/2025 - 25/07/2025 $3000 Nairobi, Kenya
04/08/2025 - 15/08/2025 $3000 Nairobi, Kenya
11/08/2025 - 22/08/2025 $3500 Mombasa, Kenya
18/08/2025 - 29/08/2025 $3000 Nairobi, Kenya
01/09/2025 - 12/09/2025 $3000 Nairobi, Kenya
08/09/2025 - 19/09/2025 $4500 Dar es Salaam, Tanzania
15/09/2025 - 26/09/2025 $3000 Nairobi, Kenya
06/10/2025 - 17/10/2025 $3000 Nairobi, Kenya
13/10/2025 - 24/10/2025 $4500 Kigali, Rwanda
20/10/2025 - 31/10/2025 $3000 Nairobi, Kenya
03/11/2025 - 14/11/2025 $3000 Nairobi, Kenya
10/11/2025 - 21/11/2025 $3500 Mombasa, Kenya
17/11/2025 - 28/11/2025 $3000 Nairobi, Kenya
01/12/2025 - 12/12/2025 $3000 Nairobi, Kenya
08/12/2025 - 19/12/2025 $3000 Nairobi, Kenya