Introduction to Amazon Athena Federated Query
Amazon Athena is a serverless and interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Recently Amazon Athena introduced Federated Query which can be used to run SQL queries across data stored in relational, non-relational, object, and custom data sources. Athena uses data source connectors that run on AWS Lambda to execute federated queries. The data source connectors help connect with data sources like CloudWatch, DocumentDB, DynamoDB, HBase, JDBC data sources (like Redshift, MySql, SQL Server)etc.
In this exercise, you will use connector for the DynamoDB to keep the cost of the exercise low. But similar configuration can be used for other data sources as well.
Step1: Pre-Requisite
You need to have an AWS account with administrative access to complete the exercise. If you don’t have an AWS account, kindly use the link to create free trial account for AWS.
Step1: Create S3 Bucket and DynamoDB Table
You first start with Amazon S3 bucket creation which is later used in Amazon workgroup and data connector configuration. You also create a DynamoDB table which is used as the data source.
-
Login to AWS Console and choose Ireland as the region. At this point of time, the federated query feature is under preview and is available in selected regions such as Ireland.
-
Goto S3 Management console and use + Create bucket button to create a bucket with name dojo-athena-fed-query. If this bucket name is not available, create with a name which is available and use the name later in the configuration.
-
Next goto DynamoDB console and click on the Create table button.
-
On the next screen, type in dojotable as the table name. Type in id as the primary key and select Number as the data type. Keep rest of the configuration to the default and click on the Create button.
-
The table is created. Select the Items tab and then click on the Create item button.
-
On the Create item pop-up, change the format from Tree to Text.
-
Once format changed to Text, copy-paste the json document as shown below and then click on the Save button.
json document
{
"id": 1,
"firstname": "John",
"lastname": "Pedro"
}
-
The item is created in no time. You can see item listed.
-
Repeat the steps 5 to 7 to create couple of more items using the json documents below.
json documents
{
"id": 2,
"firstname": "Will",
"lastname": "Smith"
}
{
"id": 3,
"firstname": "Ryan",
"lastname": "Happy"
}
{
"id": 4,
"firstname": "Tom",
"lastname": "Brad"
}
- The source data is ready. Time to configure Athena data connector for DynamoDB.
Step3: Configure Athena Data Source
You now configure DynamoDB as the data source in Athena. Then you will be able to make SQL query to the DynamoDB table.
-
Goto Athena Console and click on the Data sources link.
-
On the next screen, click on the Connect data source button.
-
On the next screen, select Query a data source (beta) option. Select Amazon DynamoDB as the data source and then click on the Next button. You can see other options available as the data sources.
-
On the next screen, click on the Configure new AWS Lambda Function link button to deploy a prebuilt Lambda function which works as connector to the DynamoDB.
-
Click on Configure new AWS Lambda Function link button will open a new browser window or tab. On that new page, goto Application settings section. Type in dojo-athena-fed-query as the SpillBucket. If you created the bucket in the previous step with a different name, then use that bucket name. Type in dojocatalog as the AthenaCatalogName. Select I acknowledge that this app creates custom IAM roles option. Then click on the Deploy button.
-
The Lambda application deployment completes in a while. Wait till the status changes to CREATE_COMPLETE.
-
The Lambda connector is ready. Go back to the tab where you were configuring the data source for Athena. Click the refresh icon next to Lambda function field. Then select dojocatalog as the lambda function. Type in dojodb as the catalog name. Click on the Connect button.
-
The data source configuration completes in no time.
-
The data source is ready for the query. Since the federated query feature is under preview at present, you must create an Athena workgroup named AmazonAthenaPreviewFunctionality and join that workgroup in order to make query. Click on Workgroup : primary link on the top and then click on Create workgroup button.
-
On the next screen, type in AmazonAthenaPreviewFunctionality as the workgroup name. Select s3://dojo-athena-fed-query/ bucket as the Query result location. If you created bucket with a different name in the previous step; then use that one. Click on the Create workgroup button.
-
The workgroup is created in no time. Select AmazonAthenaPreviewFunctionality workgroup and click on the Switch workgroup button to join the workgroup.
-
You are all set to run the query now.
Step4: Run Query
Time to run query. So let’s not waste time.
-
Goto Athena Console and click on the Query editor link. Select dojodb as the data source and you will see dojotable from DynamoDB listed as the table.
-
Copy-paste the query shown in the query snippet below to the New query 1 pane. Then click on the Run query button.
query snippet
SELECT * FROM "dojodb"."default"."dojotable" limit 10;
-
The query result is shown which presents items from the DynamoDB table.
-
You can work with various other choices such as filter with the query. It is left to you.
Step5: One time configuration
The data source connector configuration is one time job. For instance, you can add / remove tables to the DynamoDB and it will be available for query in Athena. For instance, we added sampletable as the other table and it became available for the query in Athena without any more configuration.
The exercise is complete. Please follow the next step to clean-up the resources so that you don’t incur any cost post the exercise.
Step6: Clean up
Delete dojodb as data source from Athena.
Delete dojo-athena-fed-query S3 bucket. If you created bucket with a different name then delete that one.
Delete dojotable in the DynamoDB console.
Finally delete serverlessrepo-AthenaDynamoDBConnector stack in the CloudFormation console.
Thanks and hope you enjoyed the exercise.