Create Data Lake with Amazon S3, Lake Formation and Glue

   Go back to the Task List

  « 1: Pre-requisite    3: Create IAM Users »

2: Data and Users

Before you start the workshop, let’s understand the data and users configured for the data lake.

You will configure data lake with two data sets - sales and customers. The data sets are stored in Amazon S3. The AWS Glue and AWS Lake Formation services are used to create the data lake. Finally AWS Athena is used to query the data sets.

The following are the schema of the data sets:

customers data set fields: {CUSTOMERID, CUSTOMERNAME, EMAIL, CITY, COUNTRY, TERRITORY, CONTACTFIRSTNAME, CONTACTLASTNAME}

sales data set fields: {ORDERNUMBER, QUANTITYORDERED, PRICEEACH, ORDERLINENUMBER, SALES, ORDERDATE, STATUS, QTR_ID, MONTH_ID, YEAR_ID, PRODUCTLINE, MSRP, PRODUCTCODE, DEALSIZE, CUSTOMERID}

You will also create two users - salesuser and customersuser. These two users will access data sets of the data lake with specific permissions configured.

salesuser can query all the fields of the sales data set only. customersuser can query only CUSTOMERNAME, EMAIL fields of the customers data set.

You do have an opportunity to play more with the user permissions which is left to you.

Let’s start building now.