• training@skillsforafrica.org
    info@skillsforafrica.org

End-to-end Data Engineering On The Lakehouse Architecture Training Course in Senegal

This cutting-edge training course on End-to-End Data Engineering on the Lakehouse Architecture is designed to equip data professionals with the practical skills and modern tools necessary to build scalable, unified, and efficient data pipelines using the Lakehouse paradigm. Combining the strengths of data lakes and data warehouses, the Lakehouse architecture enables real-time analytics, machine learning, and BI directly on raw data without complex data movement. Participants will master data ingestion, transformation, governance, and orchestration using technologies such as Apache Spark, Delta Lake, Databricks, and open table formats like Iceberg and Hudi—positioning themselves at the forefront of data engineering innovation.

Duration: 10 Days

Target Audience

  • Data Engineers
  • Cloud Data Architects
  • BI and Analytics Engineers
  • Data Platform Engineers
  • ETL Developers
  • Data Lake Administrators
  • ML Engineers
  • Software Engineers transitioning to data roles

Course Objectives

  • Understand the core principles of Lakehouse architecture
  • Learn how to implement scalable ELT pipelines on data lakes
  • Integrate Delta Lake, Apache Iceberg, or Apache Hudi for data reliability
  • Design efficient data lake schemas and partitioning strategies
  • Use Apache Spark and SQL for large-scale data transformation
  • Manage metadata, transactions, and schema evolution
  • Enable real-time and batch data access on a unified platform
  • Orchestrate workflows with tools like Airflow and dbt
  • Ensure data quality, governance, and lineage
  • Optimize performance, cost, and data retrieval speeds
  • Build end-to-end pipelines for BI, analytics, and ML workloads

Module 1: Introduction to Lakehouse Architecture

  • Lakehouse vs. traditional data warehouse and data lake
  • Benefits of unifying storage and analytics
  • Open formats: Delta Lake, Iceberg, Hudi
  • Key components and ecosystem overview
  • Use cases in analytics, BI, and machine learning

Module 2: Ingestion Strategies for the Lakehouse

  • ELT vs. ETL in Lakehouse environments
  • Streaming vs. batch ingestion
  • Tools for ingestion: Apache NiFi, Airbyte, Kafka, Spark
  • Handling schema drift and data validation
  • Data partitioning and file formats

Module 3: Working with Apache Spark on the Lakehouse

  • Introduction to Spark DataFrames and SQL
  • Optimizing transformations and joins
  • Structured Streaming for real-time pipelines
  • Performance tuning and job optimization
  • Working with large-scale JSON, Parquet, and ORC files

Module 4: Delta Lake Fundamentals

  • ACID transactions and time travel
  • Schema enforcement and evolution
  • Managing Delta logs and versions
  • Vacuuming and data retention policies
  • Delta Lake vs. traditional lake formats

Module 5: Apache Iceberg and Hudi Deep Dive

  • Table structure and metadata layers
  • Data compaction and clustering
  • Querying with Spark, Trino, Presto
  • Use cases for versioning and rollback
  • Performance comparisons and trade-offs

Module 6: Unified Data Modeling and Storage Design

  • Bronze, Silver, and Gold layer modeling
  • Medallion architecture design principles
  • Data normalization and denormalization
  • Partitioning, bucketing, and clustering strategies
  • Handling slowly changing dimensions (SCDs)

Module 7: Data Transformation with dbt on the Lakehouse

  • Introduction to dbt core and dbt Cloud
  • Building modular SQL models
  • Testing and documenting transformations
  • Orchestrating dbt with Airflow
  • Lineage visualization and deployment best practices

Module 8: Metadata Management and Governance

  • Cataloging data with Unity Catalog, Hive Metastore, AWS Glue
  • Implementing data classifications and tags
  • Role-based access control and fine-grained permissions
  • Managing table versions and audit logs
  • Lineage tracking with OpenMetadata or DataHub

Module 9: Workflow Orchestration and Automation

  • Creating DAGs in Airflow for Lakehouse pipelines
  • Managing dependencies and retries
  • Scheduling workflows and integrating alerts
  • Parameterization and configuration
  • Using Prefect or Dagster as alternatives

Module 10: Data Quality and Observability

  • Writing validation rules with Great Expectations
  • Detecting anomalies and outliers in data
  • Monitoring freshness, volume, and distribution
  • Building dashboards with Superset or Grafana
  • Incident management and resolution

Module 11: Real-Time Analytics and Streaming

  • Streaming ingestion with Kafka and Spark Structured Streaming
  • Building real-time dashboards and microservices
  • Window functions and watermarking
  • Use cases in IoT, finance, and e-commerce
  • Trade-offs of stream vs. micro-batch

Module 12: Machine Learning on the Lakehouse

  • Integrating ML models into pipelines
  • Feature engineering and feature stores
  • Using MLflow for tracking and deployment
  • Batch inference vs. real-time scoring
  • Lakehouse as a foundation for MLOps

Module 13: Performance Optimization and Cost Management

  • Query optimization with Z-Ordering and caching
  • Choosing the right file size and format
  • Auto-scaling compute resources
  • Storage tiering and archival
  • Tracking cost per query and job

Module 14: Security and Compliance in the Lakehouse

  • Implementing encryption at rest and in transit
  • Fine-grained access with row- and column-level security
  • Data masking and tokenization techniques
  • GDPR and HIPAA compliance measures
  • Audit trails and anomaly detection

Module 15: Building Production-Ready Lakehouse Pipelines

  • CI/CD and version control integration
  • Deploying and monitoring end-to-end workflows
  • Handling failures, retries, and rollbacks
  • Documentation and handover practices
  • Case study: From ingestion to BI using the Lakehouse stack

Training Approach

This course will be delivered by our skilled trainers who have vast knowledge and experience as expert professionals in the fields. The course is taught in English and through a mix of theory, practical activities, group discussion and case studies. Course manuals and additional training materials will be provided to the participants upon completion of the training.

Tailor-Made Course

This course can also be tailor-made to meet organization requirement. For further inquiries, please contact us on: Email: info@skillsforafrica.org, training@skillsforafrica.org Tel: +254 702 249 449

Training Venue

The training will be held at our Skills for Africa Training Institute Training Centre. We also offer training for a group at requested location all over the world. The course fee covers the course tuition, training materials, two break refreshments, and buffet lunch.

Visa application, travel expenses, airport transfers, dinners, accommodation, insurance, and other personal expenses are catered by the participant

Certification

Participants will be issued with Skills for Africa Training Institute certificate upon completion of this course.

Airport Pickup and Accommodation

Airport pickup and accommodation is arranged upon request. For booking contact our Training Coordinator through Email: info@skillsforafrica.org, training@skillsforafrica.org Tel: +254 702 249 449

Terms of Payment: Unless otherwise agreed between the two parties’ payment of the course fee should be done 7 working days before commencement of the training.

Course Schedule
Dates Fees Location Apply