1

I'm trying to store a document into Vespa with a string field. When using the document-api http endpoint it's getting rejected with a parsing error. I've validated that the correct JSON is being sent (other documents go through fine).

Here is the error message that I'm seeing:

PARSER_ERROR Error in document 'id:x:y:n=1:1FVzo2l7mMLticB0WMkBKIECMLzAg' - could not parse field 'content' of type 'string': The string field value contains illegal code point 0xB

I can see that there's a check for these sorts of characters (vertical tab in my case) com.yahoo.text.Text in allowedAsciiChars but I don't see anywhere in the documentation that I should be stripping these chars before sending to Vespa. In fact I see sort of the opposite situation where Vespa will go out of its way to replace certain chars behind the scenes without rejecting them.

2 Answers2

2

Please strip ASCII control characters from the documents before feeding.

I will update the documentation, although is seems the JSON spec says these control characters must be escaped, so these are implicitly not allowed in the feed

Community
  • 1
  • 1
Kristian Aune
  • 876
  • 5
  • 5
1

I see sort of the opposite situation where Vespa will go out of its way to replace certain chars behind the scenes

Where do you see this?

There is a Text.stripInvalidCharacters utility method provided as a utility for clients in Java which need to strip characters from non-sanitized text.

Jon
  • 2,043
  • 11
  • 9
  • 1
    I meant that in reference to linguistics processing like accent normalization, which of course is not really the same as control character handling. – user3230650 Jan 07 '19 at 19:14