AWS Kinesis Data Transformation using Glue

   Go back to the Task List

  « 5: Configure the Development Environment    7. Clean up »

6: Create and Run Kinesis Client

Let’s build and run the Kinesis client in Cloud9 environment.

  1. In the AWS Cloud9 environment, click on the New File option under the File menu.

    Kinesis

  2. It will open a new Untitled1 file. Copy-Paste the following code into it.

    Cloud9

    import boto3
    import random
    
    client = boto3.client('kinesis')
    
    mydata = '{ "fistname": "John", "lastname": "Smith", "age": 32}'
    
    partitionkey = random.randint(10, 100);
    
    response = client.put_record(StreamName='dojostream', Data=mydata, PartitionKey=str(partitionkey))
    
    print(response)
    

    `

  3. In the code above, you first create Kinesis client and then use put_record method to send mydata to the Kinesis data stream dojostream. Then you print the response back to check success / fail of it. It is important to understand the structure of input data mydata. It is JSON and it matches to the schema (shown below) created in Glue Catalog Table. Kinesis Delivery stream will transform this data into Parquet format using Glue Catalog Table schema.

    Cloud9

  4. Click on the Save option under the File menu to save the client code. In the popup, type in kinesisclient.py as the filename and then click on the Save button.

    Cloud9

  5. Run python kinesisclient.py command in the Cloud9 console to run the client file.

    Cloud9

  6. The file runs successful. Now wait for 60 seconds before the Kinesis Delivery stream transforms the data and write to the S3 bucket. You are waiting for 60 seconds because you configured 128 MiB or 60 seconds buffer condition in the Kinesis Delivery Stream. After 60 seconds open the file under dojo-kinesis-destination bucket. If you created bucket with a different name then go there. The file is created under date based partition.

    Cloud9

  7. When you download and open the file, you can see that data has been transformed from JSON to Parquet format. The file is not readable using normal text editors.

    Cloud9

  8. You can use free online Parquet viewer like Parquet-viewer-online to see the file in readable table format.

    Cloud9

  9. This finishes the workshop. Kindly follow the next step to clean-up the resources so that you don’t incur any cost post the workshop.