You will create an Amazon S3 bucket which serves many purposes. It will be used to store sample data customers.csv in the customers folder which is cataloged in the Glue Database. It will have folder output which is used by the EMR Job to write processed data.
Download the sample data customers.csv from the link.
-
Go to the S3 Management Console and create a S3 bucket with name dojo-lake. If the bucket name is not available, then use a name which is available. In this bucket, create two folders customers and output.
-
Open customers folder and upload customers.csv to it.
-
The data is ready. In the next step, you create a Glue Database.