What may be happening
I'm going to guess that you are using the Default Configuration
provided by Watson Discovery. Default Configuration
applies enrichments to a single field in the input data, the field named text
. The converters for HTML, PDF and Microsoft Word will, by default, output the body of the document into the JSON field text
. When you send JSON into Watson Discovery, no conversion is doneāthe field names pass straight through.
What you can try
- Adjust your input JSON to have a top level field named
text
which contains the text you want to be enriched.
- Make and use a custom configuration which one or more entries under
enrichments
which have the value of source_field
be the name of the field in your JSON that you want Watson Discovery to enrich.
Watson Discovery Tooling can be very helpful for experimenting with custom configurations.
Example
To get concrete about this. Here is the enrichments
portion of Default Configuration
:
"enrichments": [{
"destination_field": "enriched_text",
"source_field": "text",
"enrichment": "alchemy_language",
"options": {
"extract": "keyword, entity, doc-sentiment, taxonomy, concept, relation",
"sentiment": true,
"quotations": true
}
}]
If your JSON has English text in a field named paragraphs
and you would like Watson Discovery to provide enrichments for that field, you could use this configuration:
"enrichments": [{
"destination_field": "enriched_paragraphs",
"source_field": "paragraphs",
"enrichment": "alchemy_language",
"options": {
"extract": "keyword, entity, doc-sentiment, taxonomy, concept, relation",
"sentiment": true,
"quotations": true
}
}]