Data Engineering Training
DATA ENGINEERING
Training Summary
This comprehensive Data Engineering training program is designed to equip participants with the skills and knowledge necessary to design, build, and maintain robust data pipelines. The course will cover a wide range of topics, including:
- Data Modeling and Design: Creating efficient and scalable data models.
- Data Extraction, Transformation, and Loading (ETL): Designing and implementing ETL processes to extract, transform, and load data into data warehouses and data lakes.
- Data Warehousing and Data Lakes: Understanding the concepts and architecture of data warehouses and data lakes.
- Big Data Technologies: Leveraging technologies like Hadoop, Spark, and Kafka to process large datasets.
- Cloud Data Engineering: Utilizing cloud platforms like AWS, Azure, and GCP for data engineering tasks.
- Data Integration and API Development: Integrating data from various sources and building APIs to expose data.
- Data Quality and Governance: Ensuring data quality and implementing data governance practices.
- Design Data Models: Create efficient and scalable data models.
- Implement ETL Processes: Design and implement ETL processes to extract, transform, and load data.
- Build Data Warehouses and Data Lakes: Design and implement data warehouses and data lakes.
- Process Big Data: Handle and process large datasets using big data technologies.
- Leverage Cloud Platforms: Utilize cloud platforms for data engineering tasks.
- Integrate Data Sources: Integrate data from various sources, including databases, APIs, and files.
- Develop APIs: Build APIs to expose data to other applications.
- Ensure Data Quality: Implement data quality checks and monitoring.
- Govern Data: Implement data governance policies and procedures.
- Troubleshoot Data Pipelines: Identify and resolve issues in data pipelines.
Training Objectives
Upon completion of this training, participants will be able to:
By achieving these objectives, participants will be well-prepared to design, build, and maintain efficient and scalable data pipelines to support data-driven decision-making.
The program will cover
By combining Agile Project Management and Scrum Master training, participants will gain a comprehensive understanding of Agile principles and practices, enabling them to lead and manage successful Agile projects.
Introduction to Data Engineering
- What is Data vs Big Data
- What is Data Engineering
- Data Engineering Concepts
- Data Storage and Management
- Data Ingestion
- Data Pipeline Design and Orchestration
- Cloud Computing
- Types of Cloud Computing
Python Programming
- Getting Started with Python
- Data Types and Structures
- Comparison Operators and Conditional Statements
- For Loop and While Loop
- Functions
- Numpy
- Pandas
- Data Visualization (Matplotlib & Seaborn)
- ChatGPT for Coding
SQL (Structured Query Language)
- Introduction to SQL
- SQL Commands
- SQL for Data Analytics
- Data Query Language
- Aggregate Functions
- Joins and Subqueries
- Window Functions
Database and Data Warehouse Modeling
- Introduction to Data Modelling
- Normalization vs Denormalization
- Using LucidChart / Draw.io to Draw ER Diagrams
- Star Schema
- Snowflake Schema
Pandas
- Introduction to Pandas
- Creating Pandas Series & DataFrames
- Basic Pandas Operations - Projecting & Filtering
- Joins - Inner, Left, Right, Outer
- Advanced DataFrame Operations
- Slicing & Indexing using Pandas
GitHub for Version Control
- Introduction - Git and GitHub Overview
- Overview of GitHub Desktop
- Operations using GitHub
- GitHub, Python, SQL and Data Modelling