The part-1 of the workshop was focused on setting up data lake, development environment and then creating a job to process the data. The part-2 will focus on PySpark to learn different methods of data transformation and processing. The following is the agenda for the part-2 workshop -
- Check Schema of the Data
- Query the Data from the Source
- Update the Data
- Aggregation Functions
- Merge & Split Data Sets
- Write / Load Data at the Destination
Click on the Building AWS Glue Job using PySpark - Part:2(of 2) to continue to the part-2 of the workshop.