Welcome to part 5 of tutorial series on how to custom document classifier with AWS Comprehend tutorial series. In the previous tutorial we have successfully created test document. In this tutorial we are going to classify the test document using the classifier that we have trained in this tutorial series.

Here we are going to use two services.

  1. S3
  2. AWS Comprehend

Initially, we will upload the test document (created in previous tutorial) to S3 bucket (i.e. comprehend-classifier) in my case.

Once the file is uploaded, we will navigate to Job management in Comprehend service.

Under Job management, click on Train classifier

Post clicking on Create job, we have to configure some details. Under Job Settings panel there are two fields (i.e. Name Analysis type and Select classifier). Here, Name stands for name of the job, so it can be anything. Under Analysis type, two categories are available (i.e. Built-in and Custom). Here, we will select Custom classification under Custom category. When you select Custom classification a new field will popup that is Select classifier. And there we have to mention the name of the classifier that we have given while training the classifier. In my case it’s c1.

Coming to the Input data section, where we need to configure two fields (i.e. S3 location and Input format). In S3 location, we have to mention or enter the S3 URI of test.csv (Uploaded in the first step). Further we have to mention the type of input format in the Input format field and it is optional. But, in our case it’s One line per document.

In S3 location under Output data section, we have to mention the bucket URI in which out output will be saved.

Coming to last section, which is IAM role. Here, we are going to use existing role that we have created while training the classifier.
Note: While we select the existing role, we need to provide the explicit write permission for S3 to that role from IAM management console.

The configuration will look like the screenshot below.

Now, it will take some time to predict the labels or class on the test document. Post completion it will upload the predicted response in the S3 bucket that we mentioned while configuration.

Well, this is how you can classify the test document using our custom trained classifier. In the next tutorial, we will validate the prediction using test_truth.csv file. The file we have created while creating test document.

You can learn more from the mentioned video. And don’t forget to subscribe the channel.

Till that time, keep sharing and stay tuned for more. Follow me on Twitter