AWS Kinesis Data Transformation using Glue

   Go back to the Task List

  « 3: Create S3 Bucket and Data Schema    5: Configure the Development Environment »

4: Configure Kinesis Data and Delivery Stream

In this step, you create Kinesis Data Stream and Delivery Stream which can ingest data, transform data from JSON to Parquet format using Glue Data Catalog Schema and then write to the S3 bucket.

  1. Goto the Kinesis Management console, select Kinesis Data Streams as the option and click on the Create data stream button.

    Kinesis

  2. On the next screen, type in the stream name as dojostream. Type in 1 for the Number of open shards. Click on the Create data stream button.

    Kinesis

  3. The data stream is created in no time. Next click on the Kinesis Firehose in the left menu and click on the Create Delivery Stream button.

    Kinesis

  4. On the next screen, type in dojodeliverystream as the delivery stream name. Select Kinesis Data Stream option as the source. Select dojostream as the kinesis data stream and then click on the Next button.

    Kinesis

  5. On the next screen, in the Convert record format section, select Enabled for the Record format conversion. Select Apache Parquet for the output format. Select Ireland as the AWS Glue Region. Select dojodatabase as the Glue Database. Select dojotable as the Glue Table. Select Latest for the table version and then click on the Next button. You configured Parquet as the destination format for the transformation.

    Kinesis

  6. On the next screen, select Amazon S3 as the destination. Select dojo-kinesis-destination as the bucket. If you created bucket with a different name, then select that one. Click on the Next button.

    Kinesis

  7. On the next screen, type in 60 for the Buffer interval. Select Choose existing IAM role option and select dojokinesisrole as the IAM Role. Keep rest of the fields to the default and click on the Next button.

    Kinesis

  8. On the next Review screen, click on the Create delivery stream button. The delivery stream is created in no time.

  9. The next step is to create client application which will publish data to the Kinesis stream.