I'm not quite sure if <category="Modifier">
should work, but as far as I know, the right way in the Quickstart is annotating in the following way:
{"annotations": [
{"text_extraction": {"text_segment": {"end_offset": 85, "start_offset": 52}}, "display_name": "Modifier"},
{"text_extraction": {"text_segment": {"end_offset": 144, "start_offset": 103}}, "display_name": "Modifier"},
{"text_extraction": {"text_segment": {"end_offset": 391, "start_offset": 376}}, "display_name": "Modifier"},
{"text_extraction": {"text_segment": {"end_offset": 1008, "start_offset": 993}}, "display_name": "Modifier"},
{"text_extraction": {"text_segment": {"end_offset": 1137, "start_offset": 1131}}, "display_name": "SpecificDisease"}],
"text_snippet": {"content": "10021369\tIdentification of APC2, a homologue of the adenomatous polyposis coli tumour suppressor .\tThe ... APC - / - colon
carcinoma cells . Human APC2 maps to chromosome 19p13 . 3. APC and APC2 may therefore have comparable functions in development and cancer .\n "}
}
After importing the dataset, in the AutoML NL UI you will see the five annotations that are specified in the jsonl:

For more reference on the jsonl structure of the example above, you can take a look at the sample files in the Quickstart:
$ gsutil cat gs://cloud-ml-data/NL-entity/dataset.csv
TRAIN,gs://cloud-ml-data/NL-entity/train.jsonl
TEST,gs://cloud-ml-data/NL-entity/test.jsonl
$ gsutil cat gs://cloud-ml-data/NL-entity/train.jsonl
If you are using the python script for your own texts strings, you will see that it generates a csv file (dataset.csv) and jsonl files with content like:
{"text_snippet": {"content": "This is a disease\n Second line blah blabh"}, "annotations": []}
So, you will need to specify the annotations
(using start_offset
and the end_offset
) whose manual process can be a bit overwhelm, or you can upload the CSV file in the AutoML UI and label entities interactively.