1

I wonder what is the best way to format .DOC documents for Retrieve and Rank web interface document uploader so it handles the answer splitting the best. (I am using https://watson-retrieve-and-rank.ng.bluemix.net )

We have to create a set of documents and I can't find any guide on how to reformat them (for example, if any text size, bold, ... for title, body of the answer, etc.) will improve the automated answer splitting. The team creating those documents is not able to prepare them in proper JSON format and some of the DOC files is parsed by the service as a one page answer without any splitting

Of course, maybe there is another tool I am missing for this task.

Thanks for any experience or links.

icordoba
  • 1,834
  • 2
  • 33
  • 60

1 Answers1

3

The detailed documentation is at https://www.ibm.com/watson/developercloud/doc/document-conversion/customizing.shtml#htmlau as the tooling is using the default settings for the Document Conversion service.

However, to summarise, the tooling will split Word documents at paragraphs where a style is used with the name "Heading N" where "N" is a number.

So this includes the existing default built-in styles in MS Word (i.e. "Heading 1", "Heading 2", "Heading 3", "Heading 4", "Heading 5", "Heading 6", "Heading 7", "Heading 8", "Heading 9"). It also includes styles that you create with names like this (e.g. "Heading 123")

dalelane
  • 2,746
  • 1
  • 24
  • 27
  • Thanks, are those "Heading 1"... you say the default settings? I mean... are they the style names I have to use to they are translated into H1, H2, ... in HTML? Because sorry I can't find what are the default Style names and font sizes in the default setting (for Doc -> HTML conversion). thanks. – icordoba Aug 01 '16 at 14:28
  • This can be default styles or new styles that you create. I've edited my answer (see http://stackoverflow.com/posts/38699058/revisions ) to make this clearer. – dalelane Aug 01 '16 at 17:37