Using AWS Lake Formation Blueprint

   Go back to the Task List

  « 5. Create Private Link    7. Configure and Run Blueprint »

6. Configure Lake Formation

You configure the Lake Formation in this step. You first create database and then configure Glue Connection to the RDS Instance.

  1. Open the AWS Lake Formation console. If you are using Lake Formation for the first time in the region, it will ask you to create a data lake administrator. A data lake administrator is an IAM user or IAM role that performs administrative tasks on the data lake. For the first time user, it will popup a message to add administrators. Click on the Add administrators button to create administrators for your Data Lake.

    IAM

  2. Select your AWS logged-in IAM user from the drop down list. For the rest of workshop, the user will be considered as a data lake administrator and will have full access to the data lake. Click on the Save button.

    IAM

    Note: if you did not get the popup then that means the data lake already has an administrator. You can check that by clicking on the “Admins and database creators” menu in the left. If you see that your logged-in IAM username is listed as the “Data lake Adminstrator” then you are ok to move to the next step. Otherwise, click on the “Grant” button to add “your AWS logged-in IAM user” as the administrator of the data lake.

  3. After adding the administrator, you will create the database. In the AWS Lake Formation console, click on the Databases option in the left menu and then click on Create database button.

    IAM

  4. On the next screen, select database option. Enter dojodb as the Name. Make sure you Uncheck the option - Use only IAM access control for new tables in this database. Leave rest of the options as default and click on Create database button.

    IAM

  5. The database is added in no time. You now provide Glue IAM Role access to the database. The IAM role is used by the crawler to create catalog table in the database. Select the database and click on the Grant option under the Action menu.

    IAM

  6. On the next screen, select My account option. Select dojo-glue-role for the IAM users and roles field. Check Super option for the database permissions. Finally click on the Grant button.

    IAM

  7. The database access granted to the role. It is time to create Glue Connection to the RDS database. In the Glue Management console, select Connections menu in the left and then click on the Add connection button.

    IAM

  8. On the next screen, type in dojoconnection for the connection name. Select JDBC as the connection type. Click on the Next button.

    IAM

  9. On the next screen, type in JDBC URL in the format jdbc:protocol://host:port/databasename. The protocol is mysql, the host is the RDS endpoint URL you noted in the earlier steps, the port in 3306 and the database name is dojodatabase. Type in admin as the username. Type in Password1! for the password. Select the default VPC. Select one the subnets listed. Select dojo-mysql-sg for the security group. Click on the Next button.

    IAM

  10. On the next screen, click on the Finish button to create the connection. When the connection is created, select the connection and click on the Test connection button.

    IAM

  11. In the popup window, select dojo-glue-role as the IAM Role and click on the Test connection button.

    IAM

  12. The connection testing will start. Wait till you see connection status as Successful.

    IAM

  13. The connection is ready. In the step, you configure Blueprint to create an ETL workflow which extracts data from the relational database and loads to the S3 bucket.