Skills for Africa -End-to-End Data Engineering on the Lakehouse Architecture Training Course |Belize

End-to-end Data Engineering On The Lakehouse Architecture Training Course in Belize

This cutting-edge training course on End-to-End Data Engineering on the Lakehouse Architecture is designed to equip data professionals with the practical skills and modern tools necessary to build scalable, unified, and efficient data pipelines using the Lakehouse paradigm. Combining the strengths of data lakes and data warehouses, the Lakehouse architecture enables real-time analytics, machine learning, and BI directly on raw data without complex data movement. Participants will master data ingestion, transformation, governance, and orchestration using technologies such as Apache Spark, Delta Lake, Databricks, and open table formats like Iceberg and Hudi—positioning themselves at the forefront of data engineering innovation.

Duration: 10 Days

Target Audience

Data Engineers
Cloud Data Architects
BI and Analytics Engineers
Data Platform Engineers
ETL Developers
Data Lake Administrators
ML Engineers
Software Engineers transitioning to data roles

Course Objectives

Understand the core principles of Lakehouse architecture
Learn how to implement scalable ELT pipelines on data lakes
Integrate Delta Lake, Apache Iceberg, or Apache Hudi for data reliability
Design efficient data lake schemas and partitioning strategies
Use Apache Spark and SQL for large-scale data transformation
Manage metadata, transactions, and schema evolution
Enable real-time and batch data access on a unified platform
Orchestrate workflows with tools like Airflow and dbt
Ensure data quality, governance, and lineage
Optimize performance, cost, and data retrieval speeds
Build end-to-end pipelines for BI, analytics, and ML workloads

Module 1: Introduction to Lakehouse Architecture

Lakehouse vs. traditional data warehouse and data lake
Benefits of unifying storage and analytics
Open formats: Delta Lake, Iceberg, Hudi
Key components and ecosystem overview
Use cases in analytics, BI, and machine learning

Module 2: Ingestion Strategies for the Lakehouse

ELT vs. ETL in Lakehouse environments
Streaming vs. batch ingestion
Tools for ingestion: Apache NiFi, Airbyte, Kafka, Spark
Handling schema drift and data validation
Data partitioning and file formats

Module 3: Working with Apache Spark on the Lakehouse

Introduction to Spark DataFrames and SQL
Optimizing transformations and joins
Structured Streaming for real-time pipelines
Performance tuning and job optimization
Working with large-scale JSON, Parquet, and ORC files

Module 4: Delta Lake Fundamentals

ACID transactions and time travel
Schema enforcement and evolution
Managing Delta logs and versions
Vacuuming and data retention policies
Delta Lake vs. traditional lake formats

Module 5: Apache Iceberg and Hudi Deep Dive

Table structure and metadata layers
Data compaction and clustering
Querying with Spark, Trino, Presto
Use cases for versioning and rollback
Performance comparisons and trade-offs

Module 6: Unified Data Modeling and Storage Design

Bronze, Silver, and Gold layer modeling
Medallion architecture design principles
Data normalization and denormalization
Partitioning, bucketing, and clustering strategies
Handling slowly changing dimensions (SCDs)

Module 7: Data Transformation with dbt on the Lakehouse

Introduction to dbt core and dbt Cloud
Building modular SQL models
Testing and documenting transformations
Orchestrating dbt with Airflow
Lineage visualization and deployment best practices

Module 8: Metadata Management and Governance

Cataloging data with Unity Catalog, Hive Metastore, AWS Glue
Implementing data classifications and tags
Role-based access control and fine-grained permissions
Managing table versions and audit logs
Lineage tracking with OpenMetadata or DataHub

Module 9: Workflow Orchestration and Automation

Creating DAGs in Airflow for Lakehouse pipelines
Managing dependencies and retries
Scheduling workflows and integrating alerts
Parameterization and configuration
Using Prefect or Dagster as alternatives

Module 10: Data Quality and Observability

Writing validation rules with Great Expectations
Detecting anomalies and outliers in data
Monitoring freshness, volume, and distribution
Building dashboards with Superset or Grafana
Incident management and resolution

Module 11: Real-Time Analytics and Streaming

Streaming ingestion with Kafka and Spark Structured Streaming
Building real-time dashboards and microservices
Window functions and watermarking
Use cases in IoT, finance, and e-commerce
Trade-offs of stream vs. micro-batch

Module 12: Machine Learning on the Lakehouse

Integrating ML models into pipelines
Feature engineering and feature stores
Using MLflow for tracking and deployment
Batch inference vs. real-time scoring
Lakehouse as a foundation for MLOps

Module 13: Performance Optimization and Cost Management

Query optimization with Z-Ordering and caching
Choosing the right file size and format
Auto-scaling compute resources
Storage tiering and archival
Tracking cost per query and job

Module 14: Security and Compliance in the Lakehouse

Implementing encryption at rest and in transit
Fine-grained access with row- and column-level security
Data masking and tokenization techniques
GDPR and HIPAA compliance measures
Audit trails and anomaly detection

Module 15: Building Production-Ready Lakehouse Pipelines

CI/CD and version control integration
Deploying and monitoring end-to-end workflows
Handling failures, retries, and rollbacks
Documentation and handover practices
Case study: From ingestion to BI using the Lakehouse stack

Training Approach

This course will be delivered by our skilled trainers who have vast knowledge and experience as expert professionals in the fields. The course is taught in English and through a mix of theory, practical activities, group discussion and case studies. Course manuals and additional training materials will be provided to the participants upon completion of the training.

Tailor-Made Course

This course can also be tailor-made to meet organization requirement. For further inquiries, please contact us on: Email: info@skillsforafrica.org, training@skillsforafrica.org Tel: +254 702 249 449

Training Venue

The training will be held at our Skills for Africa Training Institute Training Centre. We also offer training for a group at requested location all over the world. The course fee covers the course tuition, training materials, two break refreshments, and buffet lunch.

Visa application, travel expenses, airport transfers, dinners, accommodation, insurance, and other personal expenses are catered by the participant

Certification

Participants will be issued with Skills for Africa Training Institute certificate upon completion of this course.

Airport Pickup and Accommodation

Airport pickup and accommodation is arranged upon request. For booking contact our Training Coordinator through Email: info@skillsforafrica.org, training@skillsforafrica.org Tel: +254 702 249 449

Terms of Payment: Unless otherwise agreed between the two parties’ payment of the course fee should be done 7 working days before commencement of the training.

Course Schedule

Dates	Fees	Location	Apply
06/10/2025 - 17/10/2025	$3000	Nairobi, Kenya	Physical Class Online Class
13/10/2025 - 24/10/2025	$4500	Kigali, Rwanda	Physical Class Online Class
20/10/2025 - 31/10/2025	$3000	Nairobi, Kenya	Physical Class Online Class
03/11/2025 - 14/11/2025	$3000	Nairobi, Kenya	Physical Class Online Class
10/11/2025 - 21/11/2025	$3500	Mombasa, Kenya	Physical Class Online Class
17/11/2025 - 28/11/2025	$3000	Nairobi, Kenya	Physical Class Online Class
01/12/2025 - 12/12/2025	$3000	Nairobi, Kenya	Physical Class Online Class
08/12/2025 - 19/12/2025	$3000	Nairobi, Kenya	Physical Class Online Class
05/01/2026 - 16/01/2026	$3000	Nairobi, Kenya	Physical Class Online Class
12/01/2026 - 23/01/2026	$3000	Nairobi, Kenya	Physical Class Online Class
19/01/2026 - 30/01/2026	$3000	Nairobi, Kenya	Physical Class Online Class
02/02/2026 - 13/02/2026	$3000	Nairobi, Kenya	Physical Class Online Class
09/02/2026 - 20/02/2026	$3000	Nairobi, Kenya	Physical Class Online Class
16/02/2026 - 27/02/2026	$3000	Nairobi, Kenya	Physical Class Online Class
02/03/2026 - 13/03/2026	$3000	Nairobi, Kenya	Physical Class Online Class
09/03/2026 - 20/03/2026	$4500	Kigali, Rwanda	Physical Class Online Class
16/03/2026 - 27/03/2026	$3000	Nairobi, Kenya	Physical Class Online Class
06/04/2026 - 17/04/2026	$3000	Nairobi, Kenya	Physical Class Online Class
13/04/2026 - 24/04/2026	$3500	Mombasa, Kenya	Physical Class Online Class
13/04/2026 - 24/04/2026	$3000	Nairobi, Kenya	Physical Class Online Class
04/05/2026 - 15/05/2026	$3000	Nairobi, Kenya	Physical Class Online Class
11/05/2026 - 22/05/2026	$5500	Dubai, UAE	Physical Class Online Class
18/05/2026 - 29/05/2026	$3000	Nairobi, Kenya	Physical Class Online Class

I agree with the Terms and Conditions