0

Requirement is to read a file. Base on few rules, modify text and write them into a different file. Source file is of UTF-8 format and all special characters needs to be converted to an HTML entity representation.

Wrote the following code, to convert non ascii characters

    private String convertNonASCIIToDecimal(String ref) throws Exception
    {
    Pattern pat = Pattern.compile("[^\\p{ASCII}]");
    Matcher matcher = pat.matcher(ref);
    String temp;
    String tempSrc = ref;
    int charValue = 0;
    while (matcher.find())
    {
        temp=matcher.group();
        charValue = (int)temp.charAt(0);
        logger.debug("As:"+temp +":" +charValue);
        tempSrc=tempSrc.replaceAll(temp, "&#" + charValue + ";");
    }
    return tempSrc;
    }

I would like to know, is there a better way or function which would do the same. Convert non ascii to its HTML entity representation.

amj
  • 383
  • 1
  • 5
  • 13
  • Look at this StackOverflow question: http://stackoverflow.com/questions/599634/convert-html-character-back-to-text-using-java-standard-library – Rob Watts Apr 15 '13 at 21:38
  • 3
    The best way is to serve an HTML file encoded in UTF8. HTML doesn't have to be in ASCII. – JB Nizet Apr 15 '13 at 21:40
  • Example of input text would be like "Alcoff, Linda Martin. “How is Epistemology Political?” " Curly quotes, emdashes and any non ascii characters. It needs to be translates to ... “How is Epistemology Political?” – amj Apr 15 '13 at 21:58
  • 1
    There is typically no reason for this other than ignorance. What is the reason here? – Esailija Apr 16 '13 at 06:01
  • possible duplicate of [When escaping a string with HTML entities, can I safely skip encoding chars above Unicode 127 if I use UTF-8?](http://stackoverflow.com/questions/4943070/when-escaping-a-string-with-html-entities-can-i-safely-skip-encoding-chars-abov) – fglez Apr 17 '13 at 10:42

0 Answers0