0

Hi I am trying to upload a json file as a data set in watson discovery console. Normally for PDFs and other accepted file formats, data enrichments (keywords, entity, taxonomy etc.) are AUTOMATICALLY created by the Discovery application. However if I upload the data set in JSON, it does not do the same.

Is there any particular JSON format that needs to be followed? And am I right that it automatically inserts the enrichments on it own?

Sayuri Mizuguchi
  • 5,250
  • 3
  • 26
  • 53
aniket
  • 11
  • 8

2 Answers2

1

What may be happening

I'm going to guess that you are using the Default Configuration provided by Watson Discovery. Default Configuration applies enrichments to a single field in the input data, the field named text. The converters for HTML, PDF and Microsoft Word will, by default, output the body of the document into the JSON field text. When you send JSON into Watson Discovery, no conversion is done—the field names pass straight through.

What you can try

  1. Adjust your input JSON to have a top level field named text which contains the text you want to be enriched.
  2. Make and use a custom configuration which one or more entries under enrichments which have the value of source_field be the name of the field in your JSON that you want Watson Discovery to enrich.

Watson Discovery Tooling can be very helpful for experimenting with custom configurations.

Example

To get concrete about this. Here is the enrichments portion of Default Configuration:

"enrichments": [{
  "destination_field": "enriched_text",
  "source_field": "text",
  "enrichment": "alchemy_language",
  "options": {
    "extract": "keyword, entity, doc-sentiment, taxonomy, concept, relation",
    "sentiment": true,
    "quotations": true
  }
}]

If your JSON has English text in a field named paragraphs and you would like Watson Discovery to provide enrichments for that field, you could use this configuration:

"enrichments": [{
  "destination_field": "enriched_paragraphs",
  "source_field": "paragraphs",
  "enrichment": "alchemy_language",
  "options": {
    "extract": "keyword, entity, doc-sentiment, taxonomy, concept, relation",
    "sentiment": true,
    "quotations": true
  }
}]
Bruce Adams
  • 443
  • 2
  • 5
0

You can upload inside interface and with cURL.

See one example (cURL) - Create a collection:

curl -X POST -u "{username}":"{password}" -H "Content-Type: application/json" -d '{
  "name": "test_collection",
  "description": "My test collection",
  "configuration_id": "{configuration_id}"
}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections?version=2016-12-01"

You will set "Content-Type: application/json". Insert your username and password with the Service Credentials. And set your enviromenment_id inside the URL.

Add some document:

curl -X POST -u "{username}":"{password}" -F file=@sample1.html "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/documents?version=2016-12-01"

Obs.: The document to ingest. The maximum supported file size is 50 megabytes. Files larger than 50 megabytes are rejected. The API detects the document type, but you can specify it if incorrect. Acceptable MIME type values are application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, and application/xhtml+xml. Specify content type in the multipart form as type=.

curl -X PUT -u "{username}":"{password}" -H "Content-Type: application/json" -d@my_config.json "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_ID}/configurations/{Configuration_ID}?version=2016-12-01"

See the official API Reference documentation.

Sayuri Mizuguchi
  • 5,250
  • 3
  • 26
  • 53