0

Currently, we are using StringEscapeUtils.escapeEcmaScript to escape any quotes and tabs etc, it works for English but when it comes to Japanese it encodes all the Japanese characters into unicode, need suggestions on how to maintain Japanese characters while escaping all the special characters(quotes, tabs etc).

example:

System.out.println(StringEscapeUtils.escapeEcmaScript("Price must be between 1 and 3"));
System.out.println(StringEscapeUtils.escapeEcmaScript("で本を販売して 70% のロイヤリティを得るに"));
System.out.println(StringEscapeUtils.escapeEcmaScript("Der Preis muss zwischen angewendet werden kann."));

output:

Price must be between 1 and 3
\u3067\u672C\u3092\u8CA9\u58F2\u3057\u3066 70% \u306E\u30ED\u30A4\u30E4\u30EA\u30C6\u30A3\u3092\u5F97\u308B\u306B
Der Preis muss zwischen angewendet werden kann.

looks like it only fails in Japanese

photosynthesis
  • 2,632
  • 7
  • 29
  • 45
  • Please provide an example of raw text and expected output. – Bohemian Jul 06 '17 at 20:45
  • 1
    (Apache Commons) `StringEscapeUtils` operates on Java *`String`s*. These are expressed in Unicode. If you are getting unwanted transcoding of your data then it presumably happens because you write them back out using an encoding different from the one with which you read them. – John Bollinger Jul 06 '17 at 20:46
  • More than example input and output, we'll need to see a [mcve] demonstrating the problem if you want our help solving it. – John Bollinger Jul 06 '17 at 20:47

1 Answers1

0

StringEscapeUtils.escapeEcmaScript always escapes characters outside of U+0020-U+007F.

If you don't want to escape Japanese characters, you have to pass only the ASCII characters in the string to StringEscapeUtils.escapeEcmaScript().

package org.example;

import java.util.Arrays;

import org.apache.commons.text.StringEscapeUtils;

public class Test {
  public static void main(String[] args) {
      System.out.println(escapeEcmaScript("Price must be between 1 and 3"));
      System.out.println(escapeEcmaScript("で本を販売して 70% のロイヤリティを得るに"));
      System.out.println(escapeEcmaScript("Der Preis muss zwischen angewendet werden kann."));
      System.out.println(escapeEcmaScript("1'2\"/3"));
  }

  public static String escapeEcmaScript(String str) {
      return Arrays.stream(str.split("")).map(s -> escapeCharacter(s)).collect(StringBuilder::new, StringBuilder::append, StringBuilder::append).toString();
  }

  public static String escapeCharacter(String str) {
      if (str.matches("\\p{ASCII}")) {
          return StringEscapeUtils.escapeEcmaScript(str);
      } else {
          return str;
      }
  }
}

You will get the result like this:

Price must be between 1 and 3
で本を販売して 70% のロイヤリティを得るに
Der Preis muss zwischen angewendet werden kann.
1\'2\"\/3
SATO Yusuke
  • 1,600
  • 15
  • 39