Important Note: You will create AWS resources during the workshop which will incur cost in your AWS account. It is recommended to clean-up the resources as soon as you finish the workshop to minimize the cost.

Using AWS Glue ETL Job with Streaming Data

Recently AWS announced streaming data support for AWS Glue ETL Jobs which helps in setting up continuous ingestion pipelines that processes streaming data on the fly. Streaming ETL jobs consume data from streaming sources likes Amazon Kinesis and Apache Kafka, clean and transform those data streams in-flight, and continuously load the results into Amazon S3 data lakes, data warehouses, or other data stores.

In this workshop, you create an ETL job which will read streaming data from Kinesis data stream and upload to Amazon S3 bucket. The ETL job will transform data from JSON to CSV format. The data to Kinesis stream is published using MQTT client using AWS IoT Core.

The following diagram shows the scenario you are going to build. Start the workshop

Workshop7