AWS Data Engineering

The AWS Data Engineering course is designed to provide in-depth knowledge and practical skills required to build, maintain, and optimize data pipelines and architectures on the Amazon Web Services (AWS) platform.

Course Rating :

4.8 (926)

Course Overview

The AWS Data Engineering course is designed to provide in-depth knowledge and practical skills required to build, maintain, and optimize data pipelines and architectures on the Amazon Web Services (AWS) platform. This course is aimed at individuals looking to develop expertise in handling large volumes of data efficiently and effectively in a cloud environment. AWS Data Engineering involves the process of designing and managing data workflows and pipelines on AWS, from data ingestion and storage to transformation and analysis. This includes leveraging a variety of AWS services to collect, process, store, and analyze data, ensuring that data is accessible and useful for business intelligence and analytics purposes.

Key Points

In this course, you will learn how to:

Course Curriculum

  • The rise of big data as a corporate asset
  • The challenges of ever-growing datasets
  • Data engineers – the big data enablers
  • Understanding the role of the data engineer
  • Understanding the role of the data scientist
  • Understanding the role of the data analyst
  • Understanding other common data-related roles
  • The benefits of the cloud when building big data analytic solutions
  • The evolution of data management for analytics
  • Databases and data warehouses
  • Dealing with big, unstructured data
  • A lake on the cloud and a house on that lake
  • Understanding data warehouses and data marts –fountains of truth
  • Distributed storage and massively parallel processing
  • Columnar data storage and efficient data compression
  • Dimensional modeling in data warehouses
  • Understanding the role of data marts
  • Feeding data into the warehouse – ETL and ELT pipelines
  • Building data lakes to tame the variety and volume of big data.
  • Data lake logical architecture
  • Bringing together the best of both worlds with the lake house architecture
  • Data Lakehouse implementations
  • Building a data Lakehouse on AWS
  • Hands-on – configuring the AWS.
  • Command Line Interface tool and creating an S3 bucket.
  • Installing and configuring the AWS CLI
  • Creating a new Amazon S3 bucket
  • Overview of Amazon Database Migration Service (DMS)
  • Overview of Amazon Kinesis for streaming data ingestion
  • Overview of Amazon MSK for streaming data ingestion
  • Overview of Amazon AppFlow for ingesting data from SaaS services
  • Overview of Amazon Transfer Family for ingestion using FTP/SFTP protocols
  • Overview of Amazon DataSync for ingesting from on-premises storage
  • Overview of the AWS Snow family of devices for large data transfers
  • Overview of AWS Lambda for light transformations
  • Overview of AWS Glue for serverless Spark processing
  • Overview of Amazon EMR for Hadoop ecosystem processing
  • Overview of AWS Glue workflows for orchestrating Glue components
  • Overview of AWS Step Functions for complex workflows
  • Overview of Amazon managed workflows for Apache Airflow
  • Overview of Amazon Athena for SQL queries in the data lake
  • Overview of Amazon Redshift and Redshift Spectrum for data warehousing and data Lakehouse architectures
  • Overview of Amazon Quick Sight for visualizing data
  • Hands-on – triggering an AWS Lambda function when a new file arrives in an S3 bucket
  • Creating a Lambda layer containing the AWS Data Wrangler library
  • Creating new Amazon S3 buckets
  • Creating an IAM policy and role for your Lambda function
  • Creating a Lambda function
  • Configuring our Lambda function to be triggered by an S3 upload

Getting data security and governance right

  • Common data regulatory requirements
  • Core data protection concepts
  • Personal data
  • Encryption
  • Anonymized data
  • Pseudonymized data/tokenization
  • Authentication
  • Authorization


Cataloging your data to avoid the data swamp.

  • How to avoid the data swamp


The AWS Glue/Lake Formation data catalog

AWS services for data encryption and security monitoring

  • AWS Key Management Service (KMS)
  • Amazon Macie
  • Amazon GuardDuty


AWS services for managing identity and permissions.

  • AWS Identity and Access Management (IAM) service
  • Using AWS Lake Formation to manage data lake access.


Hands-on – configuring Lake Formation permissions.

  • Creating a new user with IAM permissions
  • Transitioning to managing fine-grained permissions with AWS Lake Formation
  • Architecting Data Engineering Pipelines
  • Approaching the data pipeline architecture
  • Architecting houses and architecting pipelines
  • Conducting a whiteboarding session
  • Identifying data consumers and understanding their requirements
  • Identifying data sources and ingesting data
  • Identifying data transformations and optimizations
  • File format optimizations
  • Data standardization
  • Data quality checks
  • Data partitioning
  • Data denormalization
  • Data cataloging
  • Loading data into data marts
  • Wrapping up the whiteboarding session
  • Understanding data sources
  • Data variety
  • Data volume
  • Data velocity
  • Data veracity
  • Data value
  • Questions to ask.
  • Ingesting data from a relational database
  • AWS Database Migration Service (DMS)
  • AWS Glue
  • Other ways to ingest data from a database.
  • Deciding on the best approach for ingesting from a database
  • Amazon Kinesis versus Amazon
  • Managed Streaming for Kafka (MSK)
  • Hands-on – ingesting data with AWS DMS
  • Creating a new MySQL database instance
  • Loading the demo data using an Amazon EC2 instance
  • Creating an IAM policy and role for DMS
  • Configuring DMS settings and performing a full load from MySQL to S3
  • Querying data with Amazon Athena
  • Hands-on – ingesting streaming data
  • Configuring Kinesis Data Firehose for streaming delivery to Amazon S3
  • Configuring Amazon Kinesis Data Generator (KDG)
  • Adding newly ingested data to the Glue Data Catalog
  • Querying the data with Amazon Athena
  • Technical requirements
  • Transformations – making raw data more valuable.
  • Cooking, baking, and data transformations
  • Transformations as part of a pipeline
  • Types of data transformation tools
  • Apache Spark
  • Hadoop and MapReduce
  • SQL
  • GUI-based tools
  • Data preparation transformations
  • Protecting PII data
  • Optimizing the file format
  • Optimizing with data partitioning
  • Data cleansing
  • Business use case transforms
  • Data denormalization
  • Enriching data
  • Pre-aggregating data
  • Extracting metadata from unstructured data
  • Working with change data capture (CDC) data
  • Traditional approaches – data upserts and SQL views
  • Modern approaches – the transactional data lake
  • Hands-on – joining datasets with AWS Glue Studio
  • Creating a new data lake zone – the curated zone
  • Creating a new IAM role for the Glue job
  • Configuring a denormalization transform using AWS Glue Studio
  • Finalizing the denormalization transform job to write to S3.
  • Create a transform job to join streaming and film data using AWS Glue Studio
  • Understanding the impact of data democratization
  • A growing variety of data consumers
  • Meeting the needs of business users with data visualization
  • AWS tools for business users
  • Meeting the needs of data analysts with structured reporting
  • AWS tools for data analysts
  • Meeting the needs of data scientists and ML models
  • AWS tools used by data scientists to work with data.
  • Hands-on – creating data transformations with AWS Glue DataBrew
  • Configuring new datasets for AWS Glue DataBrew
  • Creating a new Glue DataBrew project 2
  • Building your Glue DataBrew recipe
  • Creating a Glue DataBrew job
  • Extending analytics with data warehouses/data marts
  • Cold data
  • Warm data
  • Hot data
  • What not to do – anti-patterns for a data warehouse
  • Using a data warehouse as a transactional datastore
  • Using a data warehouse as a data lake
  • Using data warehouses for real-time, record-level use cases
  • Storing unstructured data

Data distribution across slices

  • Redshift Zone Maps and sorting data
  • Designing a high-performance data warehouse
  • Selecting the optimal Redshift node type
  • Selecting the optimal table distribution style and sort key
  • Selecting the right data type for columns
  • Selecting the optimal table type
  • Moving data between a data lake and Redshift
  • Optimizing data ingestion in Redshift
  • Exporting data from Redshift to the data lake
  • Hands-on – loading data into an Amazon Redshift cluster and running queries Uploading our


sample data to Amazon S3

  • IAM roles for Redshift
  • Creating a Redshift cluster
  • Creating external tables for querying data in S3
  • Creating a schema for a local Redshift table
  •  Running complex SQL queries against our data
  • Understanding the core concepts for pipeline orchestration
  • What is a data pipeline, and how do you orchestrate it?
  • How do you trigger a data pipeline to run?
  • How do you handle the failures of a step in your pipeline?
  • Examining the options for orchestrating pipelines in AWS
  • AWS Data Pipeline for managing ETL between data sources.
  • AWS Glue Workflows to orchestrate Glue resources.
  • Apache Airflow as an open-source orchestration solution
  • Pros and cons of using MWAA.
  • AWS Step Function for a serverless orchestration solution
  • Pros and cons of using AWS Step Function
  • Deciding on which data pipeline orchestration tool to use
  • Hands-on – orchestrating a data pipeline using AWS Step Function
  • Creating new Lambda functions
  • Creating an SNS topic and subscribing to an email address
  • Creating a new Step Function state machine
  • Configuring AWS CloudTrail and Amazon Event Bridge
  • Amazon Athena – in-place SQL analytics for the data lake
  • Tips and tricks to optimize Amazon Athena queries.
  • Common file format and layout optimizations
  • Writing optimized SQL queries
  • Federating the queries of external data sources with Amazon Athena Query Federation
  • Querying external data sources using Athena Federated Query
  • Managing governance and costs with Amazon Athena Workgroups
  • Athena Workgroups overview
  • Enforcing settings for groups of users
  • Enforcing data usage controls
  • Hands-on – creating an Amazon Athena workgroup and configuring Athena settings.
  • Hands-on – switching Workgroups and running queries.
  • Representing data visually for maximum impact
  • Benefits of data visualization
  • Popular uses of data visualizations
  • Understanding Amazon Quick Sight’s core concepts
  • Standard versus enterprise edition
  • SPICE – the in-memory storage and computation engine for Quick Sight
  • Ingesting and preparing data from a variety of sources
  • Preparing datasets in Quick Sight versus performing ETL outside of Quick Sight
  • Creating and sharing visuals with Quick Sight analyses and dashboards
  • Visual types in Amazon Quick Sight

Learning Outcome

Upon completing a AWS  Data Engineering course, participants can expect to achieve the following learning outcomes:

  • Master Data Ingestion Techniques
  • Implement Effective Data Storage Solutions
  • Develop and Manage ETL Processes
  • Integrate Data Across Multiple Services
  • Perform Data Analysis and Visualization
  • Ensure Data Security and Compliance
  • Optimize Performance and Cost Efficiency
  • Deploy Big Data Solutions
  • Integrate Machine Learning Models
  • Gain Practical Experience
  • Prepare for AWS Certification

Who this course is for?

Following are the professionals who can advance in their career by learning AWS  Data Engineering training:

  • Data Analysts
  • Data Engineers
  • Data Scientists
  • Database Architects
  • IT professionals and Freshers who wish to build their career in advanced data warehouse tools.

FAQs

AWS Data Engineering involves designing, building, and managing data pipelines and architectures on the Amazon Web Services (AWS) platform to collect, store, process, and analyze data efficiently.

This course is ideal for data engineers, data architects, data analysts, and IT professionals interested in learning how to leverage AWS services for managing and optimizing data workflows.

Basic knowledge of databases, SQL, and familiarity with cloud computing concepts is recommended. Experience with AWS services or data engineering principles would be beneficial but not required.

The course covers data ingestion with AWS Data Pipeline and Amazon Kinesis, data storage using Amazon S3 and Amazon Redshift, ETL processes with AWS Glue and Amazon EMR, data analysis with tools like Amazon Athena and AWS QuickSight, and more.

Course durations vary depending on the provider and format (self-paced vs. instructor-led). Typically, it can range from a few weeks to a few months, depending on the depth of coverage and learning pace.

Yes, completing the AWS Data Engineering course will provide you with the knowledge and skills needed to pursue relevant AWS certifications, such as the AWS Certified Data Analytics – Specialty exam.

Yes, the course includes hands-on labs and real-world projects that allow you to apply theoretical concepts to practical scenarios, gaining valuable experience in designing and managing data workflows on AWS.

The course can be delivered through online self-paced learning platforms, live virtual instructor-led training (VILT), or blended learning formats that combine both approaches to accommodate different learning preferences.

Participants typically have access to online forums, instructor Q&A sessions, and technical support to assist with course content, labs, and projects.

Completing this course enhances your skills in AWS data engineering, making you proficient in designing scalable data solutions on AWS and improving your career prospects in roles related to data engineering and analytics.

Certifications

Here are some AWS Data-related certifications that you might find relevant:

AWS Certified Data Analytics – Specialty: This certification validates your expertise in using AWS data lakes and analytics services to derive insights from data.

AWS Certified Database – Specialty: This certification is for those who demonstrate an understanding of AWS database services and how to help an organization leverage the performance and benefits of modern database solutions.

AWS Certified Data Engineer – Associate: The new certification for data engineers validates skills and knowledge in core data-related AWS services, ability to ingest and transform data, orchestrate data pipelines, design data models, manage data life cycles, and ensure data quality..

Enroll Free Demo Class

Have Any Questions ?

Prerequisites

There are no mandatory prerequisites for learning AWS  Data Engineering, but having basic knowledge or experience in the data warehouse and SQL is an added advantage.

Our Other Courses

The AWS Data Engineering course is designed to provide in-depth knowledge and practical skills required to build, maintain, and optimize data pipelines.

In this Azure Data Engineering training course, the student will learn how to implement and manage data engineering workloads on Microsoft Azure.

RBCloudGenX’s Snowflake training Online is aligned with the latest curriculum of the Snowflake certification exam.

RBCloudGenX Databricks course is designed to equip learners with the knowledge and skills necessary to work with Apache Spark and Databricks.

Rate This Course !

Click on a star to rate it!

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

Enroll Free Demo Class