Custom Text Classification using Amazon Comprehend

   Go back to the Task List

  « 1: Pre-requisite    3: Train the Model »

2. Prepare Data

You will use a sample data to train the model to classify the text. The workshop uses a sample data from the Kaggle website. Download the data from the link. The sample data has news titles and their classification as Real or Fake. It is a csv file with the following fields:

Class - It classifies text as Real or Fake

News Title - The news title text which is classified as Real or Fake

Please download the file to get familiar with the data and its format.

You upload the sample data file to a S3 bucket.

Login to AWS Console and choose Ireland as the region.

Create a bucket with name dojo-text-records and upload news_test.csv file into the bucket. If this bucket name is not available, use a bucket name which is available.

Amazon Comprehend

The training data is ready. The next step is to build and train a model using this training data.