7

I often have to url encode or decode a large collection or array of strings. Besides iterating through them and using the static URLDecoder.decode(string, "UTF-8"), are there any libraries out there that will make this type of operation more performant?

A colleague insists that using the static method to decode the strings in-place is not thread safe. Why would that be?

jrws
  • 73
  • 1
  • 4
  • Unless the static method is relying on static variables in the URLDecoder class, each method call goes on the stack separately and is thread safe. I don't see any reason why URLDecoder.decode(...) would need any access to shared resources. – Thomas May 01 '12 at 18:09

4 Answers4

8

The JDK URLDecoder wasn't implemented efficiently. Most notably, internally it relies on StringBuffer (which unnecessarily introduces synchronization in the case of URLDecoder). The Apache commons provides URLCodec, but it has also been reported to have similar issues in regards to performance but I haven't verified that's still the case in most recent version.

Mark A. Ziesemer wrote a post a while back regarding the issues and performance with URLDecoder. He logged some bug reports and ended up writing a complete replacement. Because this is SO, I'll quote some key excerpts here, but you should really read the entire source article here: http://blogger.ziesemer.com/2009/05/improving-url-coder-performance-java.html

Selected quotes:

Java provides a default implementation of this functionality in java.net.URLEncoder and java.net.URLDecoder. Unfortunately, it is not the best performing, due to both how the API was written as well as details within the implementation. A number of performance-related bugs have been filed on sun.com in relation to URLEncoder.

There is an alternative: org.apache.commons.codec.net.URLCodec from Apache Commons Codec. (Commons Codec also provides a useful implementation for Base64 encoding.) Unfortunately, Commons' URLCodec suffers some of the same issues as Java's URLEncoder/URLDecoder.

...

Recommendations for both the JDK and Commons:

When constructing any of the "buffer" classes, e.g. ByteArrayOutputStream, CharArrayWriter, StringBuilder, or StringBuffer, estimate and pass-in an estimated capacity. The JDK's URLEncoder currently does this for its StringBuffer, but should do this for its CharArrayWriter instance as well. Common's URLCodec should do this for its ByteArrayOutputStream instance. If the classes' default buffer sizes are too small, they may have to resize by copying into new, larger buffers - which isn't exactly a "cheap" operation. If the classes' default buffer sizes are too large, memory may be unnecessarily wasted.

Both implementations are dependent on Charsets, but only accept them as their String name. Charset provides a simple and small cache for name lookups - storing only the last 2 Charsets used. This should not be relied upon, and both should accept Charset instances for other interoperability reasons as well.

Both implementations only handle fixed-size inputs and outputs. The JDK's URLEncoder only works with String instances. Commons' URLCodec is also based on Strings, but also works with byte[] arrays. This is a design-level constraint that essentially prevents efficient processing of larger or variable-length inputs. Instead, the "stream-supporting" interfaces such as CharSequence, Appendable, and java.nio's Buffer implementations of ByteBuffer and CharBuffer should be supported.

...

Note that com.ziesemer.utils.urlCodec is over 3x as fast as the JDK URLEncoder, and over 1.5x as fast as the JDK URLDecoder. (The JDK's URLDecoder was faster than the URLEncoder, so there wasn't as much room for improvement.)

I think your colleague is wrong to suggest URLDecode is not thread-safe. Other answers here explain in detail.

EDIT [2012-07-03] - Per later comment posted by OP

Not sure if you were looking for more ideas or not? You are correct that if you intend to operate on the list as an atomic collection, then you would have to synchronize all access to the list, including references outside of your method. However, if you are okay with the returned list contents potentially differing from the original list, then a brute force approach for operating on a "batch" of strings from a collection that might be modified by other threads could look something like this:

/**
 * @param origList will be copied by this method so that origList can continue
 *                 to be read/write by other threads. 
 * @return list containing  decoded strings for each entry that was 
           in origList at time of copy.
 */
public List<String> decodeListOfStringSafely(List<String> origList)
        throws UnsupportedEncodingException {
    List<String> snapshotList = new ArrayList<String>(origList);
    List<String> newList  = new ArrayList<String>(); 

    for (String urlStr : snapshotList) {
      String decodedUrlStr  = URLDecoder.decode(urlStr, "UTF8");
          newList.add(decodedUrlStr);
    }

    return newList;
}

If that does not help, then I'm still not sure what you are after and you would be better served to create a new, more concise, question. If that is what you were asking about, then be careful because this example out of context is not a good idea for many reasons.

Community
  • 1
  • 1
kaliatech
  • 17,579
  • 5
  • 72
  • 84
  • Thank you. I probably should have broken this into two separate questions: one about a java library for encoding/decoding whole collections or arrays of strings and one about the thread safety issue. WRT Apache's URLCodec, this seems to still only work on one string or object at a time. The performance comparisons are you showed are helpful. – jrws May 03 '12 at 13:59
  • WRT the thread safety issue, I should have provided more context (or left the issue for another question, please forgive the noob). As far as it goes, the thread safety issue came up because the collection to perform the in place substitution of the values is a method argument: public foo(List strings). The collection itself is called by value, but the objects it references are still references to objects, so it seems to bear out that synchronization is the safest thing to do, since I have no control over the callers usage of this collection. Something like this... – jrws May 07 '12 at 15:35
  • ...public foo(List strings){List decodedStrings = Collections. synchronizedList(strings); synchronized(decodedStrings) {for(String decodedString : decodedStrings) {decodedString = URLDecoder.decode(decodedString, "UTF-8")}};} – jrws May 07 '12 at 15:44
  • It's still not clear what you intend to do. (Your example does nothing with the decodedString.) You can not decode the strings "in-place" because Java strings are immutable. Perhaps you intend to replace the given collection's string references with newly decoded strings at the same indexes in the list? (Generally not a good design idea.) Perhaps you intend to return a new collection of decodedStrings? If so, then your example is still not thread-safe because ALL uses of the list must be wrapped in synchronized block. A better approach would make to make a copy of the list before iterating. – kaliatech May 07 '12 at 17:08
  • You are accurately calling out the flaw in my question - I hould not have put the thread-safety question on top of this one. I am really looking for an an answer to a library that will "batch" decode/encode a collection of strings, and so far no one has come up with one. WRT your response - lets assume that yes, all that is being done after this example is that the List with new String references is returned - then thread-safety for the collection is not assured in this example, and the caller of foo will have to synchronize his call to foo and other refs to the list passed to it. – jrws Jul 03 '12 at 14:00
  • Edited answer with an example per your comment... I think. If still not helpful, then you should create a new question. – kaliatech Jul 03 '12 at 18:39
  • Thanks - the amended example is iterating over the list and changing each item one at a time, using the static method in URLDecoder. The original question should not have included the question about thread safety. I am looking for a library that could encode/decode a collection of strings in a single call - it appears that such a library is not available, and you and I would implement a solution in essentially the same way. Thank you for your responses, and I apologize about making the original question indeterminate. – jrws Aug 27 '12 at 16:10
0

Apache has URLCodec which can be used for encoding decoding.

If your static method just works on the local variables or final initialized variables then it is completely thread safe.

As parameters live on stack and they are completely thread safe, final constants are immutable hence cannot be changed.

Following code is completely thread safe:

public static String encodeMyValue(String value){
  // do encoding here
}

Care should be taken if final variables are mutable meaning you cannot reassign it but you can change its internal representation(properties).

user100464
  • 17,331
  • 7
  • 32
  • 40
mprabhat
  • 20,107
  • 7
  • 46
  • 63
0

Thread Safety is actually never really necessary with static functions (or it is a design failure). Especially not, if the don't even access static Variables in the class.

I would suggest using the function you used before, and iterating through the collection

dlaxar
  • 549
  • 1
  • 4
  • 9
0

Basically there's no magic thread-safety applied to static methods or instance methods or constructors. They can all be called on multiple threads concurrently unless synchronization is applied. If they don't fetch or change any shared data, they will generally be safe - if they do access shared data, you need to be more careful.

so In your case you can write synchronized method on top of this urldecoding or encoding by which you can enforce thread safety externally.

Subhrajyoti Majumder
  • 40,646
  • 13
  • 77
  • 103