Introduction to AWS Glue Studio

   Go back to the Task List

  « 1: Pre-requisite    3: Create IAM Roles »

2: Data and Processing

Before you start the workshop, let’s understand the data and what data processing we configure by creating AWS Glue Job in AWS Glue Studio.

You will configure a quick and small data lake with a sample customers data. The data is stored in Amazon S3. AWS Glue and AWS Lake Formation services are used to create the data lake.

The following is the schema of the customers data:

fields: {CUSTOMERID, CUSTOMERNAME, EMAIL, CITY, COUNTRY, TERRITORY, CONTACTFIRSTNAME, CONTACTLASTNAME}

You create a Glue job in the Glue Studio which reads {CUSTOMERNAME, EMAIL} fields from the dataset and writes to a S3 bucket.

Let’s start building.