Using Lambda UDF with Amazon Redshift
Amazon Redshift supports user-defined function (UDF). The function is stored in the database and is available for any user with sufficient privileges to run. Amazon Redshift can use custom functions defined in AWS Lambda as part of SQL queries. Lambda UDFs are defined and managed in Lambda. One can control the access privileges to invoke these UDFs in Amazon Redshift. One can invoke multiple Lambda functions in the same query or invoke the same function multiple times.
The AWS Resource consumption for the exercise does not fall under AWS Free Tier.
Step1: Pre-Requisite
You need to have an AWS account with administrative access to complete the exercise. If you don’t have an AWS account, kindly use the link to create free trial account for AWS.
Step2: Create IAM Role.
You create IAM Role for the Redshift cluster which is used to provide access to the Lambda function.
-
Login to the AWS Console and choose Ireland as the region.
-
Goto the IAM Management console and click on the Roles menu in the left and then click on the Create role button.
-
On the next screen, select Redshift - Customizable as the service \ use case and click on the Next: Permissions button.
-
On the next screen, select PowerUserAccess as the policy and click on the Next: Tags button.
-
On the next screen, click on the Next: Review button.
-
On the next screen, type in dojoredshiftrole as the role name and click on the Create role button.
-
The role is created in no time. Please make note of the Role ARN as you need it later in the exercise. The next step is to launch the Redshift cluster.
Step3: Launch Redshift Cluster
You launch a Redshift cluster which used to test the UDF function defined in Lambda for the query purpose.
-
Goto Redshift Management Console and click on the Create cluster button.
-
On the next screen, type in dojoredshift for the cluster identifier and select Free trial option.
-
On the same screen, in the Database configurations section, type in Password1! for the master user password field. Keep rest of the fields as the default.
-
On the same screen, expand Cluster permissions (optional) section. Select dojoredshiftrole as the IAM Role and click on the Add IAM Role button (please make sure you click this button to add the role to the cluster). Finally click on the Create cluster in the bottom of the screen.
-
It will take a moment to create the Redshift cluster. Wait till the status of the cluster changes to Available.
-
Few things to remember about the cluster which will be useful later.
cluster identifier: dojoredshift
username: awsuser
password: Password1!
database: dev
-
The cluster is ready. You create table and insert data in the Redshift cluster database.
Step4: Insert Data in Redshift Database
In this step, you create a table in the database and then insert data into it.
-
In the Redshift Management Console, click on the EDITOR menu in the left. It will open the Connect to database popup. Select Create new connection option. Select dojoredshift as the cluster. Type in dev for the database. Type in awsuser for the user and Password1! for the password. Finally click on the Connect to database button to connect to the cluster.
-
On the next screen, run the following SQL statement one by one to create orderdetails table and then insert some sample data into it.
CREATE TABLE orderdetails ( orderNumber int, productCode varchar, quantityOrdered int, priceEach int, orderLineNumber int );
`
insert into orderdetails(orderNumber,productCode,quantityOrdered,priceEach,orderLineNumber) values (10100,'S18_1749',30,136,3), (10100,'S18_2248',50,55,2), (10100,'S18_4409',22,75,4), (10100,'S24_3969',49,35,1), (10101,'S18_2325',25,108,4), (10101,'S18_2795',26,167,1), (10101,'S24_1937',45,32,3), (10101,'S24_2022',46,44,2), (10102,'S18_1342',39,95,2), (10102,'S18_1367',41,43,1), (10103,'S10_1949',26,214,11);
`
-
This table and data is ready. You create a Lambda function in the next step which is used for the UDF function.
Step5: Create Lambda Function
You create Lambda function which is used to define the UDF function in the Redshift Database.
-
Goto the Lambda Console, click on the Functions menu in the left and then click on the Create function button.
-
On the next screen, select Author from scratch as the option. Type in dojolambdafunction as the name. Select Python 3.8 as the runtime. Select Create a new role with basic Lambda permissions as the option for the permissions. Finally click on the Create function button.
-
The function is created in no time. Next, you goto the Function code area and update the code with the code provided below.
import json def lambda_handler(event, context): ret = dict() res = [] for argument in event['arguments']: quantity = argument[0] price = argument[1] res.append(json.dumps(quantity * price)) ret['success'] = True ret['results'] = res return json.dumps(ret)
`
-
In the code above, the Lambda function receives quantity and price column values as the parameters. It returns an additional calculated column which has the value of quantity x price.
-
Click on the Deploy button to save and upload the lambda function code.
-
The Lambda function is ready. You now configure UDF function and use it in the next step.
Step6: Create Lambda UDF and Query Data
In this step, you configure and use UDF function in the Redshift database.
-
In the Redshift Management Console, click on the EDITOR menu in the left. It will open the Connect to database popup. Select Create new connection option. Select dojoredshift as the cluster. Type in dev for the database. Type in awsuser for the user and Password1! for the password. Finally click on the Connect to database button to connect to the cluster.
-
On the next screen, run the following SQL statement to create UDF function udf_dojolambdafunction. Before running the SQL, replace {ROLE-ARN} with role ARN you made note of in the previous steps.
CREATE OR REPLACE EXTERNAL FUNCTION udf_dojolambdafunction (quantity int, price int) RETURNS varchar STABLE LAMBDA 'dojolambdafunction' IAM_ROLE '{ROLE-ARN}';
`
-
The UDF function is ready. In the same query editor, run the following query to use the UDF function. The UDF function in reverse uses Lambda function to give the result. You query ordernumber, quantityordered, priceeach columns and then get an additional column totalprice using UDF function which has the value equal to quantityordered x priceeach.
select ordernumber, quantityordered, priceeach, udf_dojolambdafunction(quantityordered, priceeach) as totalprice from orderdetails
`
-
The query result looks like the following.
-
This finishes the exercise. It was a simple function to demonstrate the capability. You can create a very complex business logic using Lambda based UDF function. Follow the next step to clean up the resources so that you don’t incur any cost post the exercise.
Step7: Clean up
Delete dojoredshift Redshift Cluster.
Delete dojoredshiftrole IAM Role.
Delete dojolambdafunction Lambda Function.
Thanks and hope you enjoyed the exercise.