In the step, you configure Blueprint template to create an ETL workflow from RDS instance source to the S3 bucket destination. It will also catalog the source and destination data.
-
Goto the AWS Lake Formation console, click on the Blueprints option in the left menu and then click on the Use blueprint button.
-
On the next screen, select Database snapshot option. Select dojoconnection for the database connection. Type in dojodatabase/% for the source data path. You are trying to import all the data from the dojodatabase in the RDS instance.
-
On the same screen, select dojodb for the target database. Select s3://dojo-target-bucket for the target storage location. If you created bucket with a different name then select that one. Select Parquet for the data format. You can any other data format as well if you prefer.
-
On the same screen, select Run on demand for the frequency.
-
On the same screen, select dojoblueprintworkflow for the workflow name. Select dojo-glue-role for the IAM Role. Type in the table prefix as shown below. Click on the Create button.
-
The Blueprint workflow configuration will be ready in no time. Once it is ready, select the Blueprint workflow and click on the Start option under the Actions menu to start the execution of the workflow.
-
The workflow execution will start. It might take quite some time to finish. Wait till the status changes to COMPLETED.
-
The ETL workflow execution is complete. If you check the tables in the lake formation, you can see the source database tables in the catalog. You can also see the target data in the S3 bucket along with the catalog in the Lake Formation. You can see that the source data is in mysql while the destination data is in S3 bucket in the Parquet format.
-
The Blueprint configuration automatically creates multiple jobs, workflows and triggers in the background in order to complete the task. You can verify that as well.
-
This finishes the workshop. Clean-up the resources so that you don’t incur any cost post the workshop.