Introduction to AWS Glue Studio

   Go back to the Task List

  « 6: Configure and Run Crawler    8: Clean up »

7: Create Job in Glue Studio

In this step, you create a job using Glue Studio which reads data from the customers table and writes only {CUSTOMERNAME, EMAIL} fields to a S3 bucket folder.

  1. First you grant dojogluerole role Select access permission on the customers table because this role is used by the job you are going to create.

  2. Go to Lake Formation Console. Click on the Tables menu in the left. Then select customers table and click on the Grant menu option under Actions.

    AWS Glue Studio

  3. On the next screen, select My account option. Select dojogluerole as the IAM Users and roles. Choose Select for the table permission. Keep rest of the configuration to the default and click on the Grant button.

    AWS Glue Studio

  4. The role now has the required access permission. Go to Glue Service console and click on the AWS Glue Studio menu in the left.

    AWS Glue Studio

  5. On the next screen, click on the Create and manage jobs link.

    AWS Glue Studio

  6. On the next screen, select Blank graph option and click on the Create button.

    AWS Glue Studio

  7. It opens the Glue Studio editor. Click on the Job Details tab. Type in dojogluejob for the name and select dojogluerole for the IAM Role. Keep rest of the fields as the default and click on the Save button.

    AWS Glue Studio

  8. The job name and role configuration is saved. Go back to the Visual tab and click on the + icon.

    AWS Glue Studio

  9. It creates a new node in the editor. On Node properties tab, type in Read-Source for the name. Select Data source - S3 Bucket for the node type. Click on the Data source properties - S3 tab.

    AWS Glue Studio

  10. On the Data source properties - S3 tab, select dojodb as the database, select customers for the table. You are selecting customers table in the dojodb as the source. Click on the Save button. The job is saved.

    AWS Glue Studio

  11. On the Visual tab and click on the + icon to add a new node. On Node properties tab, type in Select-Data for the name. Select Transform - SelectFields for the node type. Keep the node parents field to Read-Source. Then click on the Transform tab.

    AWS Glue Studio

  12. On the Transform tab, select customername and email fields and click on the Save button. You are selecting select fields transformation and filtering down the data column to the customername and email fields. The job is saved.

    AWS Glue Studio

  13. On the Visual tab and click on the + icon to add a new node. On Node properties tab, type in Write-Data for the name. Select Data target - S3 Bucket for the node type. Keep the node parents field to Select-Data. Then click on the Data target properties - S3 tab.

    AWS Glue Studio

  14. On the Data target properties - S3 tab, keep the format for JSON. Keep compression type to None. Select s3://dojo-customer-data/output/ for the S3 target location. If you created the bucket with a different name, then you replace dojo-customer-data part with that name. Click on the Save button. You are asking Job to write transformed data to the S3 bucket in non-compressed JSON format. The job is saved.

    AWS Glue Studio

  15. The job is ready. Click on the Run button for the job. It will start the job execution. Click on the Run Details link to check the job status.

    AWS Glue Studio

  16. Wait till the status of the job changes to Succeeded.

    AWS Glue Studio

  17. You can go to the output S3 bucket location to see a new file created by the Glue Job.

    AWS Glue Studio

  18. If you download and open the file, you can see the output JSON data.

    AWS Glue Studio

  19. This finishes the workshop where you learnt to create Glue Job using Glue Studio. Follow the next step to clean-up the resources so that you don’t incur any cost post the workshop.