AWS Data Engineering

Course Rating :

4.8 (926)

Course Overview

The AWS Data Engineering course is designed to provide in-depth knowledge and practical skills required to build, maintain, and optimize data pipelines and architectures on the Amazon Web Services (AWS) platform. This course is aimed at individuals looking to develop expertise in handling large volumes of data efficiently and effectively in a cloud environment. AWS Data Engineering involves the process of designing and managing data workflows and pipelines on AWS, from data ingestion and storage to transformation and analysis. This includes leveraging a variety of AWS services to collect, process, store, and analyze data, ensuring that data is accessible and useful for business intelligence and analytics purposes.

Key Points

In this course, you will learn how to:

Data ingestion with AWS Data Pipeline, Amazon Kinesis, and AWS Glue.
Data storage solutions using Amazon S3, RDS, DynamoDB, and Redshift.
ETL processes with AWS Glue, Amazon EMR, and AWS Lambda.
Data integration across services using AWS Glue and Data Pipeline.
Data analysis and visualization with Redshift, Athena, and QuickSight.

Performance optimization with CloudWatch and Trusted Advisor.
Big data solutions using Amazon EMR for Hadoop and Spark.
Machine learning integration with Amazon SageMaker.
Hands-on labs and real-world projects for practical experience.
Data security using IAM, KMS, and encryption techniques.

Course Curriculum

An Introduction to Data Engineering

The rise of big data as a corporate asset
The challenges of ever-growing datasets
Data engineers – the big data enablers
Understanding the role of the data engineer
Understanding the role of the data scientist
Understanding the role of the data analyst
Understanding other common data-related roles
The benefits of the cloud when building big data analytic solutions

Data Management Architectures for Analytics

The evolution of data management for analytics
Databases and data warehouses
Dealing with big, unstructured data
A lake on the cloud and a house on that lake
Understanding data warehouses and data marts –fountains of truth
Distributed storage and massively parallel processing
Columnar data storage and efficient data compression
Dimensional modeling in data warehouses
Understanding the role of data marts
Feeding data into the warehouse – ETL and ELT pipelines
Building data lakes to tame the variety and volume of big data.
Data lake logical architecture
Bringing together the best of both worlds with the lake house architecture
Data Lakehouse implementations
Building a data Lakehouse on AWS
Hands-on – configuring the AWS.
Command Line Interface tool and creating an S3 bucket.
Installing and configuring the AWS CLI
Creating a new Amazon S3 bucket

The AWS Data Engineer Toolkit AWS services for ingesting data

Overview of Amazon Database Migration Service (DMS)
Overview of Amazon Kinesis for streaming data ingestion
Overview of Amazon MSK for streaming data ingestion
Overview of Amazon AppFlow for ingesting data from SaaS services
Overview of Amazon Transfer Family for ingestion using FTP/SFTP protocols
Overview of Amazon DataSync for ingesting from on-premises storage
Overview of the AWS Snow family of devices for large data transfers

AWS services for transforming data

Overview of AWS Lambda for light transformations
Overview of AWS Glue for serverless Spark processing
Overview of Amazon EMR for Hadoop ecosystem processing

AWS services for orchestrating big data pipelines

Overview of AWS Glue workflows for orchestrating Glue components
Overview of AWS Step Functions for complex workflows
Overview of Amazon managed workflows for Apache Airflow

AWS services for consuming data

Overview of Amazon Athena for SQL queries in the data lake
Overview of Amazon Redshift and Redshift Spectrum for data warehousing and data Lakehouse architectures
Overview of Amazon Quick Sight for visualizing data
Hands-on – triggering an AWS Lambda function when a new file arrives in an S3 bucket
Creating a Lambda layer containing the AWS Data Wrangler library
Creating new Amazon S3 buckets
Creating an IAM policy and role for your Lambda function
Creating a Lambda function
Configuring our Lambda function to be triggered by an S3 upload

Data Cataloging, Security, and Governance

Getting data security and governance right

Common data regulatory requirements
Core data protection concepts
Personal data
Encryption
Anonymized data
Pseudonymized data/tokenization
Authentication
Authorization

Cataloging your data to avoid the data swamp.

How to avoid the data swamp

The AWS Glue/Lake Formation data catalog
AWS services for data encryption and security monitoring

AWS Key Management Service (KMS)
Amazon Macie
Amazon GuardDuty

AWS services for managing identity and permissions.

AWS Identity and Access Management (IAM) service
Using AWS Lake Formation to manage data lake access.

Hands-on – configuring Lake Formation permissions.

Creating a new user with IAM permissions
Transitioning to managing fine-grained permissions with AWS Lake Formation

Architecting and Implementing Data Lakes and Data Lake Houses

Architecting Data Engineering Pipelines
Approaching the data pipeline architecture
Architecting houses and architecting pipelines
Conducting a whiteboarding session
Identifying data consumers and understanding their requirements
Identifying data sources and ingesting data
Identifying data transformations and optimizations
File format optimizations
Data standardization
Data quality checks
Data partitioning
Data denormalization
Data cataloging
Loading data into data marts
Wrapping up the whiteboarding session

Ingesting Batch and Streaming Data

Understanding data sources
Data variety
Data volume
Data velocity
Data veracity
Data value
Questions to ask.
Ingesting data from a relational database
AWS Database Migration Service (DMS)
AWS Glue
Other ways to ingest data from a database.
Deciding on the best approach for ingesting from a database

Ingesting streaming data

Amazon Kinesis versus Amazon
Managed Streaming for Kafka (MSK)
Hands-on – ingesting data with AWS DMS
Creating a new MySQL database instance
Loading the demo data using an Amazon EC2 instance
Creating an IAM policy and role for DMS
Configuring DMS settings and performing a full load from MySQL to S3
Querying data with Amazon Athena
Hands-on – ingesting streaming data
Configuring Kinesis Data Firehose for streaming delivery to Amazon S3
Configuring Amazon Kinesis Data Generator (KDG)
Adding newly ingested data to the Glue Data Catalog
Querying the data with Amazon Athena

Transforming Data to Optimize for Analytics

Technical requirements
Transformations – making raw data more valuable.
Cooking, baking, and data transformations
Transformations as part of a pipeline
Types of data transformation tools
Apache Spark
Hadoop and MapReduce
SQL
GUI-based tools
Data preparation transformations
Protecting PII data
Optimizing the file format
Optimizing with data partitioning
Data cleansing
Business use case transforms
Data denormalization
Enriching data
Pre-aggregating data
Extracting metadata from unstructured data
Working with change data capture (CDC) data
Traditional approaches – data upserts and SQL views
Modern approaches – the transactional data lake
Hands-on – joining datasets with AWS Glue Studio
Creating a new data lake zone – the curated zone
Creating a new IAM role for the Glue job
Configuring a denormalization transform using AWS Glue Studio
Finalizing the denormalization transform job to write to S3.
Create a transform job to join streaming and film data using AWS Glue Studio

Identifying and Enabling Data Consumers

Understanding the impact of data democratization
A growing variety of data consumers
Meeting the needs of business users with data visualization
AWS tools for business users
Meeting the needs of data analysts with structured reporting
AWS tools for data analysts
Meeting the needs of data scientists and ML models
AWS tools used by data scientists to work with data.
Hands-on – creating data transformations with AWS Glue DataBrew
Configuring new datasets for AWS Glue DataBrew
Creating a new Glue DataBrew project 2
Building your Glue DataBrew recipe
Creating a Glue DataBrew job

Loading Data into a Data Mart

Extending analytics with data warehouses/data marts
Cold data
Warm data
Hot data
What not to do – anti-patterns for a data warehouse
Using a data warehouse as a transactional datastore
Using a data warehouse as a data lake
Using data warehouses for real-time, record-level use cases
Storing unstructured data

Redshift architecture review and storage deep dive

Data distribution across slices

Redshift Zone Maps and sorting data
Designing a high-performance data warehouse
Selecting the optimal Redshift node type
Selecting the optimal table distribution style and sort key
Selecting the right data type for columns
Selecting the optimal table type
Moving data between a data lake and Redshift
Optimizing data ingestion in Redshift
Exporting data from Redshift to the data lake
Hands-on – loading data into an Amazon Redshift cluster and running queries Uploading our

sample data to Amazon S3

IAM roles for Redshift
Creating a Redshift cluster
Creating external tables for querying data in S3
Creating a schema for a local Redshift table
Running complex SQL queries against our data

Orchestrating the Data Pipeline

Understanding the core concepts for pipeline orchestration
What is a data pipeline, and how do you orchestrate it?
How do you trigger a data pipeline to run?
How do you handle the failures of a step in your pipeline?
Examining the options for orchestrating pipelines in AWS
AWS Data Pipeline for managing ETL between data sources.
AWS Glue Workflows to orchestrate Glue resources.
Apache Airflow as an open-source orchestration solution
Pros and cons of using MWAA.
AWS Step Function for a serverless orchestration solution
Pros and cons of using AWS Step Function
Deciding on which data pipeline orchestration tool to use
Hands-on – orchestrating a data pipeline using AWS Step Function
Creating new Lambda functions
Creating an SNS topic and subscribing to an email address
Creating a new Step Function state machine
Configuring AWS CloudTrail and Amazon Event Bridge

Ad Hoc Queries with Amazon Athena

Amazon Athena – in-place SQL analytics for the data lake
Tips and tricks to optimize Amazon Athena queries.
Common file format and layout optimizations
Writing optimized SQL queries
Federating the queries of external data sources with Amazon Athena Query Federation
Querying external data sources using Athena Federated Query
Managing governance and costs with Amazon Athena Workgroups
Athena Workgroups overview
Enforcing settings for groups of users
Enforcing data usage controls
Hands-on – creating an Amazon Athena workgroup and configuring Athena settings.
Hands-on – switching Workgroups and running queries.

Visualizing Data with Amazon Quick Sight

Representing data visually for maximum impact
Benefits of data visualization
Popular uses of data visualizations
Understanding Amazon Quick Sight’s core concepts
Standard versus enterprise edition
SPICE – the in-memory storage and computation engine for Quick Sight
Ingesting and preparing data from a variety of sources
Preparing datasets in Quick Sight versus performing ETL outside of Quick Sight
Creating and sharing visuals with Quick Sight analyses and dashboards
Visual types in Amazon Quick Sight

Learning Outcome

Upon completing a AWS Data Engineering course, participants can expect to achieve the following learning outcomes:

Master Data Ingestion Techniques
Implement Effective Data Storage Solutions
Develop and Manage ETL Processes
Integrate Data Across Multiple Services
Perform Data Analysis and Visualization
Ensure Data Security and Compliance
Optimize Performance and Cost Efficiency
Deploy Big Data Solutions
Integrate Machine Learning Models
Gain Practical Experience
Prepare for AWS Certification

Who this course is for?

Following are the professionals who can advance in their career by learning AWS Data Engineering training:

Data Analysts
Data Engineers
Data Scientists
Database Architects
IT professionals and Freshers who wish to build their career in advanced data warehouse tools.

FAQs

What is AWS Data Engineering?

AWS Data Engineering involves designing, building, and managing data pipelines and architectures on the Amazon Web Services (AWS) platform to collect, store, process, and analyze data efficiently.

Who should take the AWS Data Engineering course?

This course is ideal for data engineers, data architects, data analysts, and IT professionals interested in learning how to leverage AWS services for managing and optimizing data workflows.

What are the prerequisites for the AWS Data Engineering course?

Basic knowledge of databases, SQL, and familiarity with cloud computing concepts is recommended. Experience with AWS services or data engineering principles would be beneficial but not required.

What topics are covered in the course?

The course covers data ingestion with AWS Data Pipeline and Amazon Kinesis, data storage using Amazon S3 and Amazon Redshift, ETL processes with AWS Glue and Amazon EMR, data analysis with tools like Amazon Athena and AWS QuickSight, and more.

How long does it take to complete the AWS Data Engineering course?

Course durations vary depending on the provider and format (self-paced vs. instructor-led). Typically, it can range from a few weeks to a few months, depending on the depth of coverage and learning pace.

Will this course prepare me for AWS certifications?

Yes, completing the AWS Data Engineering course will provide you with the knowledge and skills needed to pursue relevant AWS certifications, such as the AWS Certified Data Analytics – Specialty exam.

Are there hands-on labs and projects included in the course?

Yes, the course includes hands-on labs and real-world projects that allow you to apply theoretical concepts to practical scenarios, gaining valuable experience in designing and managing data workflows on AWS.

How is the course delivered?

The course can be delivered through online self-paced learning platforms, live virtual instructor-led training (VILT), or blended learning formats that combine both approaches to accommodate different learning preferences.

What support is available during the course?

Participants typically have access to online forums, instructor Q&A sessions, and technical support to assist with course content, labs, and projects.

What are the career benefits of completing the AWS Data Engineering course?

Completing this course enhances your skills in AWS data engineering, making you proficient in designing scalable data solutions on AWS and improving your career prospects in roles related to data engineering and analytics.

Certifications

Here are some AWS Data-related certifications that you might find relevant:

AWS Certified Data Analytics – Specialty: This certification validates your expertise in using AWS data lakes and analytics services to derive insights from data.

AWS Certified Database – Specialty: This certification is for those who demonstrate an understanding of AWS database services and how to help an organization leverage the performance and benefits of modern database solutions.

AWS Certified Data Engineer – Associate: The new certification for data engineers validates skills and knowledge in core data-related AWS services, ability to ingest and transform data, orchestrate data pipelines, design data models, manage data life cycles, and ensure data quality..

Enroll Free Demo Class

Have Any Questions ?

Prerequisites

There are no mandatory prerequisites for learning AWS Data Engineering, but having basic knowledge or experience in the data warehouse and SQL is an added advantage.