Can we specify the training and test document percentage in AWS comprehend Custom Entity Recognizer?

Asked Jun 20 '19 at 12:50

Active Apr 07 '20 at 05:00

Viewed 112 times

I trained a custom entity recognizer using AWS comprehend for an entity extraction problem. The trained recognizer uses default train and test data-split which here splits test data more than train data. This affects the recognizer metrics. Also these values(number of train & test documents) are higher than the total input "train.csv" file added in s3 bucket for training. Total number of inputs given in csv file : 1010 Recognizer used train documents : 2480 Recognizer used test document : 3270

edited Apr 07 '20 at 05:00

John Rotenstein

241,921
22
380
470

asked Jun 20 '19 at 12:50

Navya

question is not pretty much clear. – user269867 Jul 31 '19 at 00:02
@Navya I have the same issue. Seems like Comprehend is treating each line in the file as training documents. So, your 1010 documents might have multiple lines (2480+3270 = 5750). I am still puzzled 2480 train documents and 3270 test documents. Usually it is other way around, but seems like they are doing upside down! – i.n.n.m Apr 03 '20 at 06:46

Can we specify the training and test document percentage in AWS comprehend Custom Entity Recognizer?

0 Answers0