Important Note: You will create AWS resources during the exercise which will incur cost in your AWS account. It is recommended to clean-up the resources as soon as you finish the exercise to minimize the cost.

Working with Data API for Amazon Redshift

Amazon Redshift can be accessed using the built-in Data API. Applications can use Redshift Data API to enable data access, ingest, and egress. The Data API takes care of managing database connections and returning data. The Data API stores the query results for 24 hours and is asynchronous so one can retrieve the results later.

In this exercise, a Cloud9 based environment is used to talk to the Redshift Cluster using the Data API and Python as the programming language.

The AWS Resource consumption for the exercise falls under AWS Free Tier.

Step1: Pre-Requisite


You need to have an AWS account with administrative access to complete the exercise. If you don’t have an AWS account, kindly use the link to create free trial account for AWS.

Step2: Launch Redshift Cluster


Let’s start with launching a Redshift cluster which will be accessed using the Data API.

  1. Login to the AWS Console and choose Ireland as the region.

  2. Goto Redshift Management Console and click on the Create cluster button.

    Redshift Data API

  3. On the next screen, type in dojoredshift for the cluster identifier and select Free trial option.

    Redshift Data API

  4. On the same screen, in the Database configurations section, type in **Password1!” for the master user password field. Keep rest of the fields as the default. Finally click on the Create cluster in the bottom of the screen.

    Redshift Data API

  5. It will take a moment to create the Redshift cluster. Wait till the status of the cluster changes to Available.

    Redshift Data API

  6. Few things to remember about the cluster which will be useful later.

    cluster identifier: dojoredshift

    username: awsuser

    password: Password1!

    database: dev

  7. The cluster is ready. Let’s put some data in the cluster database dev.

Step3: Insert Data in Redshift Database


Let’s prepare some data in the Redshift cluster database so that we can query it later using the Data API.

  1. In the Redshift Management Console, click on the EDITOR menu in the left. It will open the Connect to database popup. Select Create new connection option. Select dojoredshift as the cluster. Type in dev for the database. Type in awsuser for the user and Password1! for the password. Finally click on the Connect to database button to connect to the cluster.

    Redshift Data API

  2. On the next screen, run the following SQL statement to create the countrybusinessindex table in the dev database.

    Redshift Data API

    create table countrybusinessindex
    (
    Country varchar(100),
    BusinessIndex int
    )
    

    `

  3. Next run the following SQL statement to insert data (4 records) into the countrybusinessindex table in the dev database. It is just creating a table and some records which can be accessed using the Data API. The data is sample and has no actual presentation or meaning.

    Redshift Data API

    insert into countrybusinessindex values
    ('Albania',1),
    ('Angola',1),
    ('Australia',8),
    ('Austria',6);
    

    `

  4. The sample data is ready. In the next step, you configure Amazon Secrets Manager to store Redshift cluster credentials so that it can be used by the Redshift Data API.

Step4: Configure Secrets Manager


Configure Amazon Secrets Manager to store Redshift credentials.

  1. Goto Secrets Management console and click on the Store a new secret button.

    Redshift Data API

  2. On the next screen, select Credentials for Redshift cluster option. Type in awsuser as the user name. Type in Password1! as the password. Select dojoredshift Redshift cluster. Keep rest of the configuration to the default. Click on the Next button.

    Redshift Data API

  3. On the next screen, type in dojosecret for the secret name. Click on the Next button.

    Redshift Data API

  4. On the next screen, keep the default configuration and click on the Next button.

  5. On the next screen, click on the Store button. The secret is created in no time. Open the secret details to make note of secret ARN. You need it later in the exercise.

    Redshift Data API

  6. The secret manager is ready. Let’s launch a SageMaker Notebook which you use for the Python programming.

Step5: Configure SageMaker Notebook


The exercise will use Amazon SageMaker Notebook as the client to call Redshift Data API. Before you launch the SageMaker Notebook, you create an IAM role which SageMaker Notebook will use.

  1. Goto the IAM Management console, click on the Roles menu in the left and then click on the Create role button.

    Redshift Data API

  2. On the next screen, select SageMaker as the service and click on the Next: Permissions button.

    Redshift Data API

  3. On the next screen, click on the Next: Tags button.

  4. On the next screen, click on the Next: Review button.

  5. On the next screen, type in dojosagemakerrole as the role name and click on the Create role button.

    Redshift Data API

  6. The role is created in no time. Open the dojosagemakerrole role details, remove AmazonSageMakerFullAccess policy and attach PowerUserAccess policy to the role.

    Redshift Data API

  7. The role is ready. Let’s launch the Notebook. Goto Amazon SageMaker console. Select Notebook instances in the left and then click on the Create notebook instance button.

    Redshift Data API

  8. On the next screen, type in dojodataapinotebook as the notebook instance name, select dojosagemakerrole as the IAM role. Leave rest of the configuration as the default and click on the Create notebook instance button.

    Redshift Data API

  9. The notebook instance launch starts. Wait till the status changes to InService.

    Redshift Data API

  10. The notebook is ready, let’s write Python code to use Data API for Redshift.

Step6: Programming with Data API


The notebook instance is ready. You now write code which calls Redshift Data API to make query from the database table.

  1. In the Amazon SageMaker console, click on the Open Jupyter link for the dojodataapinotebook instance.

    Redshift Data API

  2. It will open Jupyter in a new browser tab or window. Select conda_python3 option under the New menu. Basically, you are starting a notebook with Python3. Such notebook also comes with Python Boto3 SDK deployed which is used in calling Redshift Data API.

    Redshift Data API

  3. It will open a notebook in a new browser tab or window.

    Redshift Data API

  4. Since we are using a recently released API, let’s upgrade botocore and boto3 for the notebook. Run the command pip install botocore –upgrade in the notebook cell to upgrade botocore.

    Redshift Data API

  5. Next run the command pip install boto3 –upgrade in the notebook cell to upgrade boto3.

    Redshift Data API

  6. The botocore and boto3 upgrade requires the notebook kernel to be restarted. Click on the Restart option under Kernel menu to restart the kernel. It will ask for the confirmation in a popup message, click on the Restart button.

    Redshift Data API

  7. All set for the coding now. Copy-paste the following code in the cell in the notebook.

    Redshift Data API

    import boto3
    import time
    
    client = boto3.client('redshift-data')
    
    response = client.execute_statement(ClusterIdentifier = "dojoredshift", Database= "dev", SecretArn= "{SECRET-ARN}",Sql= "select * from countrybusinessindex")
    
    queryid = response["Id"];
    
    response = client.describe_statement (Id = queryid)
    while True:
        response = client.describe_statement (Id = queryid)
        if (response["Status"] == "FINISHED"):
            print("Query Status - " + response["Status"])
            break;
        else:
            print("Query Status - " + response["Status"])
            time.sleep(2)
    
    response = client.get_statement_result(Id = queryid)
    for datarecord in response["Records"]:
        print (datarecord)
    

    `

  8. In the code above, first of all, replace {SECRET-ARN} with the secret ARN you created to store Redshift credentials in the Secrets Manager.

  9. The code is simple. You make redshift-data client. Then call execute_statement method to run your query passing cluster id, database, secret ARN and SQL Statement as the parameters. The secret ARN helps in authentication and authorization of this API call using the Redshift credentials stored in Secrets Manager. The SQL statement for this exercise is simple and it is fetching all the records from the countrybusinessindex table. The execute_statement method returns a query id which is important for making any follow-up request related to the query. Since the query execution is asynchronous, the code uses describe_statement method to check the query status till the status changes to FINISHED. Once the query status changes to FINISHED, the code calls get_statement_result to get query results and prints them.

  10. Run the code. You can see the query status printed and finally the records are printed when the query status changes to FINISHED.

    Redshift Data API

  11. This was an example of how you can build client to call Redshift Data API to work with the data in the databases and tables. The exercise finishes here. Goto the next task to clean-up the resources so that you don’t incur any cost post the exercise.

Step7: Clean up


dojoredshift Redshift Cluster.

dojosecret Secret in Secrets Manager.

dojodataapinotebook SageMaker Notebook.

dojosagemakerrole IAM Role.

Thanks and hope you enjoyed the exercise.


Back to the Exercises