Before you start the workshop, let’s understand the data and what data processing we configure by creating AWS Glue Job in AWS Glue Studio.
You will configure a quick and small data lake with a sample customers data. The data is stored in Amazon S3. AWS Glue and AWS Lake Formation services are used to create the data lake.
The following is the schema of the customers data:
Fields: {CUSTOMERID, CUSTOMERNAME, EMAIL, CITY, COUNTRY, TERRITORY, CONTACTFIRSTNAME, CONTACTLASTNAME}
You then create a Glue job in the Glue Studio which performs the following transformation using Custom Transformation -
-
Concatenate CONTACTFIRSTNAME and CONTACTLASTNAME fields into CONTACTFULLNAME field.
-
Drop CONTACTFIRSTNAME and CONTACTLASTNAME fields.
Finally, the job writes the transformed data to the S3 bucket.
Let’s start building.