0

Using a sequence of GenerateTableFetch, ExecuteSQL, SplitAvro, and ConvertAvroToJSON processors, I am fetching a JSON field from a MySql view that has this content:
"A 7-point scale (1=\u201Cnot at all\u201D to 7=\u201Cextremely\u201D) is used.."

If I view the content of the file in a queue and chose option formatted (as opposed to original), I get this:
"A 7-point scale (1=“not at all” to 7=“extremely”) is used..."

And this unescaped string is what I would like to store in a NoSQL db. Is this in-built NIFI viewer using a function that I can tap into?

I am asking this because later in the flow, I wrap the JSON within an xml tag in order to transform it to XML using an XSLT stylesheet. But I end up with the unicode characters after the transformation and would like to retrieve back the original unescaped JSON (before I store it in the NoSQL db).

FernOfTheAndes
  • 4,975
  • 2
  • 17
  • 25
  • I am trying to avoid working with FlowFile attributes. I would like to work with the content. – FernOfTheAndes May 27 '20 at 17:56
  • 1
    `\u201C` is a json-encoded character in a string that represents `“`. so json formatted nifi-viewer decodes it to display. – daggett May 27 '20 at 20:48
  • @daggett Thanks, from this comment and the answer from Andy, the direction seems to be that I need to provide the conversion mappings myself...there are many other occurrences of quotes and em-dashes that I will need to preserve due to authors requirements. – FernOfTheAndes May 27 '20 at 21:11
  • but this is a correct json value. why do you need to replace it? are you going to store json into nosql or a text value from this json? – daggett May 27 '20 at 21:56
  • @daggett the string `\u201C` is showing in the NoSQL content...I am using the XSLT transform from https://stackoverflow.com/questions/13007280/how-to-convert-json-to-xml-using-xslt (the one that supports null) and I suspect it is converting the unicode sequence to a literal string with characters \u201C, otherwise the NoSQL db would have been able to unescape it properly – FernOfTheAndes May 28 '20 at 00:11
  • if you are going to use executescript , then think about converting json-to-xml with help of script - then you don't have to use to replace any chars by yourself. – daggett May 28 '20 at 03:03
  • @daggett Agreed...this whole replacement of characters ended up being a fool's errand in this case...so, I went ahead and simplified the flow by skipping replace and transform steps and storing the JSON in the db, and finally transforming it to XML within the db using its API functions. All the special chars were preserved. End of story. Thanks for thinking this through with me. Appreciated. – FernOfTheAndes May 28 '20 at 10:32

1 Answers1

1

You can use a ReplaceText processor to replace all instances of a byte sequence (\u201C) in the flowfile content with . If you need the leading and trailing quotes to be different, you can use ReplaceTextWithMapping to associate the different Unicode code points with the specific replacement value. If you don't, you can just use the generic ReplaceText, match \u201[CD], and replace it with ".

Andy
  • 13,916
  • 1
  • 36
  • 78
  • Thanks Andy. I will use an `ExecuteScript` processor with a Python script to global-replace all the unicode escape sequences I can identify. I really don't like having to take this approach since it can be rather tedious. I was hoping to borrow the services of the NIFI viewer. – FernOfTheAndes May 27 '20 at 21:17
  • 1
    Andy, your response is appropriate within the context of my question. Because I should have been more specific in the formulation of my question, I am referring anyone that comes here to my last comment to daggett above. – FernOfTheAndes May 28 '20 at 10:35