Welcome to part 3 of custom document classifier with AWS Comprehend tutorial series. In the previous tutorial we have prepared the training document. In this tutorial we are going to train the custom document classifier.
Here we will use three services.
The flow is going to be that we will upload the training document into the S3 bucket and provide the training document reference to the Comprehend service while configuring the classifier with the correct permission.
We will go ahead with the S3 management console and create the bucket. After bucket is created, we will upload out train.csv document to the S3 bucket.
Once the document is uploaded, we will jump to the AWS Comprehend service. Within Comprehend service, navigate to the Custom classification under Customization section and click on Train classifier.
Post clicking on Train classifier, we have to configure some details. Under Classifier settings there is two fields (i.e. Name and Language). Here, Name can be anything and it’s basically the name of the classifier. Under Language, in our case it’s English. But it supports multiple languages.
Under Training Data, we have to mention or enter the URI of the train.csv document that we just uploaded in the S3 step. In my case, it is s3://comprehend-classifier/train.csv
Coming to the last section, which is IAM role. Here, we will select Create IAM role. Under Permission to access, we will select the Training data bucket option (It will only able to access the training bucket). And finally, we will give type a string for the suffix in the Name suffix field. Ideally, it should look like below reference image. Once, everything is configured, click on Train classifier
Now, it’s going to take time to train the classifier. And you can check the status under Status column. After successful completion of training, we can go ahead and check out the details. It will look like this.
As, we can see its providing all the details regarding number of training documents, number of classes along with the classifier performance. And we can see that the classifier has performed well on the test documents.
Note: AWS Comprehend will use between 10 and 20 percent of the documents that you submit for training, to test the custom classifier. You can learn more here
This is how, we can train the custom classifier with AWS Comprehend service. Well, thats it for now. In the next tutorial we will prepare the test document for creating the classification job using the classifier/model that we just trained. Learn more hands on about training the classifier with the below mentioned video tutorial.
Till that time, keep sharing and stay tuned for more. Follow me on Twitter