2

I've been running into a lot of JAXB serialization errors that are caused by the fact that code is creating invalid qualified names in various places. I'm investigating the API I'm using and other java XML options, and one thing that's strange is that classes which implement qualified names don't appear to do any input checking at all.

This is really problematic, because complex code generates various JAXB objects, and it's not until marshalling time when you figure out something has gone horribly wrong. The exception stack typically doesn't tell you which element/attribute is wrong, just that something's wrong.

Wouldn't it make more sense for these libraries to make it more difficult to create un-serializable content in the first place?

Here's a code snippet: why does this work? Shouldn't it throw an IllegalArgumentException? In other APIs which define QName, the behavior is the same. The javadocs for this class specify that if the namespace is null, you'll get an IllegalArgumentException but not otherwise.

    QName q = new QName("Namespace URI is supposed to be an anyURI, but clearly !!THIS ISN'T!!", 
                        "Local part is supposed to be an NCName, but clearly !!THIS ISN'T!!",
                        "<><><><>&&& Laughably Invalid Namespace Prefix");
    System.out.println(q);

References: Relevant javadoc for QName, spec constraints stating name is an anyURI, and localpart is an NCName. In other words, according to the spec, the code above is blatantly invalid, irrespective of serialization.

FrobberOfBits
  • 17,634
  • 4
  • 52
  • 86

2 Answers2

2

Hypothesizing here.

The primary user of the QName constructor is quite likely not generic Java code per se, but it's the XML parser.

XML parsing is performance sensitive. If the constructor is being called by the parser, in theory the parser has already validated the syntax, so validating it again is a waste of time.

It's a 95% issue. Why pay the price for 5% of the users.

If you want a validating QName constructor, you can extend QName and add your own code.

For example:

public class VerifiedQName extends QName {
    public VerifiedQName(String namespaceURI, String localPart) {            
        super(namespaceURI, localPart);
        verfiyNamespaceURI(namespaceURI); // throws IllegalArgumentException
        verifyLocalPart(localPart);  // throws IllegalArgumentException
    }
    ...
}

At least the class is documented as not validating.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
1

javax.xml.namespace.QName isn't a JAXB class, it is just a class that is part of Java SE that JAXB leverages.

Reason's Not to Verify the Data

  1. String manipulation is expensive, do you want to take the hit of inspecting every String every time?
  2. What if the definition of a valid local name or namespace changes, then do you need to tell QName which version of the definition to validate against? Are you not able to leverage the new definition until the version in Java SE is updated to support it?
  3. In a pinch you can use it with non XML data. Some JAXB implementations (i.e. MOXy) support formats other than XML (i.e. JSON), having classes like QName being tolerant of "invalid" data allows them to be used in ways other than intended.

UPDATE

Do you know that these are the reasons, or are you speculating?

Pure speculation, but I do lead the EclipseLink MOXy implementation of JAXB.

Why would String checking be expensive in a constructor?

It's more expensive than not doing anything. Also even if it is a small cost per constructor call, perform the operation enough and it can become expensive.

It's worth mentioning that other types related to the spec (like URL and URI) both throw exceptions out of their constructors for invalid formats.

You generally don't create a lot of these objects, so the user assistance from the exception outweighs the cost of doing the check.

Also, when does the definition of QNames change?

Granted it's not likely to change. However JAXB implementations are often used to support more than XML these days. Having a tolerant QName implementation is handy when you are reading/writing data from another format such as JSON.

bdoughan
  • 147,609
  • 23
  • 300
  • 400
  • Do you know that these are the reasons, or are you speculating? Why would String checking be expensive in a constructor? It's worth mentioning that other types related to the spec (like URL and URI) both throw exceptions out of their constructors for invalid formats. Also, when does the definition of QNames change? The implementation of QName specificies which version of the spec it implements (http://www.w3.org/TR/xmlschema-2/#QName). It makes no claim to support multiple versions, and this version hasn't been updated since 2004! – FrobberOfBits Apr 28 '14 at 14:16
  • @FrobberOfBits - I have updated my answer based on your comment. – bdoughan Apr 28 '14 at 14:29