Using Amazon Redshift in AWS based Data Lake

   Go back to the Task List

  « 6. Configure Data Lake    8. PySpark Code to Access Data »

7. Developer Endpoint

The data lake is ready. You now create a developer endpoint which is used to write PySpark code to work with Redshift data in data lake. The code can then later be used to create Glue Job. Though we are not covering the job part today.

  1. Goto the AWS Glue console, click on the Dev endpoints option in the left menu and then click on the Add endpoint button.

    Glue

  2. On the next screen, type in dojoendpoint as the name. Select dojo-glue-role as the IAM role. Then click on the Next button.

    Glue

  3. On the next screen, select Choose a connection as the option. Select dojoconnection as the connection. Click on the Next button.

    Glue

  4. On the next Add an SSH public key (Optional) screen, click on the Next button.

  5. On the next Review screen, click on the Finish button. The endpoint creation will start.

    Glue

  6. It will take some 8-10 mins for the developer endpoint to be ready. Wait till the status changes to READY.

    Glue

  7. Once the developer endpoint is ready, select it and click on Create Sagemaker notebook under the Action dropdown menu.

    Glue

  8. On the next screen, enter dojonotebook as notebook name, select Create an IAM role as the option, type in dojonotebookrole for the IAM Role. Keep rest of the configuration to the default and click on the Create notebook button.

    Glue

  9. The notebook creation will start. Wait till the notebook status changes to Ready.

    Glue

  10. The development environment is ready. Let’s do PySpark programming in notebook to work with Redshift data using data lake.