Getting Started with Amazon EMR

   Go back to the Task List

  « 2. Create S3 Bucket    4. Launch Jupyter Notebook »

3. Launch EMR Cluster

You launch EMR cluster which is used to process data using PySpark based scripting.

  1. Goto the EMR Management console and click on the Create cluster button.

    Amazon EMR

  2. On the next screen, click on the Go to advanced options link.

    Amazon EMR

  3. On the next screen, use the default selection for the Release field. For the software configuration, make additional selection for JupyterEnterpriseGateway 2.1.0 and Spark 2.4.7 along with the default selection. Keep rest of the configuration to the default and click on the Next button.

    Amazon EMR

  4. On the next screen, select Uniform instance groups for the Instance group configuration. Select default VPC for the network and select one the default subnet. Keep rest of the configuration to the default and click on the Next button.

    Amazon EMR

  5. On the next screen, type in dojocluster for the cluster name. Keep rest of the configuration to the default and click on the Next button.

    Amazon EMR

  6. On the next screen, keep the configuration to the default and click on the Create clsuter button.

    Amazon EMR

  7. The cluster creation will start. It will take some time to finish. Wait till the cluster status changes to Cluster ready.

    Amazon EMR

  8. The cluster is ready. You launch Jupyter Notebook in the next step.