Custom Text Classification using Amazon Comprehend

   Go back to the Task List

  « 2. Prepare Data    4: Configure SageMaker Notebook »

3: Train the Model

You use the sample data loaded in the S3 bucket to train a model for text classification. The model can predict whether a news title text is Real or Fake.

  1. Goto the Amazon Comprehend console, click on the Custom classification menu in the left and then click on the Train classifier button.

    Amazon Comprehend

  2. On the next screen, type in dojotextclassifier for the name. Select English for the language. Select Using Multi-class mode option for the Classifier mode.

    Amazon Comprehend

  3. On the same screen, select CSV file for the Training data format. Select s3://dojo-text-records/news_test.csv for the S3 location. If you created bucket with a different name, then use that bucket.

    Amazon Comprehend

  4. On the same screen, select Create an IAM role option. Type in dojoclassifierrole for the name suffix. Finally, click on the Train classifier button.

    Amazon Comprehend

  5. It will start training the model. Wait till the status changes to Trained. It might take upto an hour to train the model.

    Amazon Comprehend

  6. Once the model is trained, on the classifier details screen, click on the Create endpoint button.

    Amazon Comprehend

  7. On the next screen, type in dojotextclassifierep as the endpoint name. Type in 1 for Inference units. Finally, click on the Create endpoint button.

    Amazon Comprehend

  8. It will throw pop up and ask for the confirmation, click on the Confirm button in the pop up. The endpoint creation will start. Wait till the status changes to Ready.

    Amazon Comprehend

  9. Once the endpoint is ready, make note of the endpoint ARN as you need it later in the workshop.

    Amazon Comprehend

  10. The model and endpoint are ready. Create SageMaker Notebook Instance in the next step which you use to call the endpoint.