I want to use Google Natural Language API to classify query results: Classifying content
The query results, which I want to classify, are available in HTML and plain text. The official documentation says that the API accepts both types Document.Type.PLAIN_TEXT
and Document.Type.HTML
.
Because the HTML format has additional annotations like e.g. <b>important text</b>
, I am wondering which format is better to achieve the best classification result possible?