0

First the reason I need to do this is because of the API in Apache's Commons StringUtils method:

StringUtils.replaceEach(String text, String[] searchList, String[] replacementList) 

What I want to do is replace all the HTML special character encodings with the actual special character, which means that the searchList and replacementList arrays will be pretty large. How can I do this in an easy to read and maintain way?

Yes I could create two arrays but if I do this then it will be very easy to make mistakes. How do I know I'm not missing a special encoding, that I have the right position, etc. I would much rather have code where the encoding and the character are side by side to avoid any errors. I looked at a HashMap but then you can only get the keys (encodings) and have to loop through to get the character values, which is not very performant, especially not if it's going to be run a lot. The same is true with a two dimensional array that you have to split each run.

Stephane Grenier
  • 15,527
  • 38
  • 117
  • 192
  • 1
    Not really an answer, but you are aware of https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html to escape/unescape HTML? – Erik Pragt Jul 09 '16 at 22:47
  • I would love to do that, and even looked at JSoup, the problem is that I only want to replace some encodings and not all of them. I haven't been able to find a way to selectively replace the encodings :( – Stephane Grenier Jul 09 '16 at 22:54
  • Understood. In that case, I'd just make a List, where each Token consist of the item to search for and the replacement, and turn that List into the two arrays using some kind of function. If you populate the searchList and replacementList only once from the Token list, I'm sure you're pretty safe. – Erik Pragt Jul 09 '16 at 23:54

2 Answers2

2

What type of performance are you aiming for? If you're looking to replace HTML special characters, can you not cache the result of splitting a HashMap of encodings to special characters in two static final variables of some sort? This will still require you to commit to the overhead of processing a HashMap, but saving the result prevents you from running the procedure every call. Something like this:

import java.util.HashMap;
import java.util.Map;

class MyStringReplaceCLass {
  private static final String[] encodings;
  private static final String[] specialCharacters;

  static {
      HashMap<String, String> characterEncoding = new HashMap<String, String>();
      characterEncoding.put("...", "...");
      characterEncoding.put("...", "...");

      // Put other encodings here as necessary

      encodings = new String[characterEncoding.size()];
      specialCharacters = new String[characterEncoding.size()];

      Map.Entry<String, String>[] entries = characterEncoding.entrySet();

      for (int i = 0; i < entries.length; i++) {
          encodings[i] = entries[i].getKey();
          specialCharacters[i] = entries[i].getValue();
      }
  }

  public String replaceEachEncoding(String text) {
      return StringUtils.replaceEach(String text, String[] searchList, String[] replacementList);
  }
}

From here, you can call

MyStringReplaceClass.replaceEachEncoding(myText)

I'm not entirely sure if this meets your requirements exactly, but I feel a map of some sort with light processing would be the cleanest solution.

mkzh
  • 196
  • 1
  • 8
  • That's pretty much what I came up with, you only have the cost once with the static block. It's too bad there's no built-in structure but yes that's the only way I came up where I could see the information logically grouped together in the code. – Stephane Grenier Jul 13 '16 at 17:20
-1

Say for text length of N, number of special characters M, searchList length of K. With HashMap, numbers of compares=N*K and number of exchanges M.

For performance, 1. you may create a Tag for your search/replacement list. Then scan though text and Tag each entry (record the indices). N compares. 2. Now you have M indices to replace with K possible characters. Compares = MK. Exchanges M. Compares N + MK < N*K. Exchanges M

Hope it helps!

Yogesh Luthra
  • 175
  • 1
  • 10