5

How can one validate an XML file using an XSD in Java? We don't know the schema in advance. I would like to be able to get the schemaLocation, download the XSD, cache it and then perform the actual validation.

The problem is, that with javax.xml.parsers.DocumentBuilder/DocumentBuilderFactory classes I can't seem to be able to get a hold of the schemaLocation in advance. What's the trick for this? Which classes should I look into?

Perhaps there's a more suitable API I can use? The whole problem is that we need to validate dynamically, without (necessarily) having the XSDs locally.

How could one get a hold of the URL of schemaLocation defined in the XSD file?

I know you can set features/attributes, but that's a different thing. I need to get the schemaLocation from the XSD first.

Please advise!

carlspring
  • 31,231
  • 29
  • 115
  • 197

1 Answers1

5

Given that you are using Xerces (or JDK default), have you tried setting this feature to true on the factory: http://apache.org/xml/features/validation/schema. There are other features that you can play with regarding schemas: http://xerces.apache.org/xerces2-j/features.html

UPDATE 2 (for caching):

Implement a org.w3c.dom.ls.LSResourceResolver and set this on the SchemaFactory using the setResourceResolver method. This resolver would either get the schema from cache or fetch it from wherever the location refers to.

UPDATE 3:

LSResourceresolver example (which I think will be a good starting point for you):

/**
 * Resolves resources from a base URL
 */
public class URLBasedResourceResolver implements LSResourceResolver {

private static final Logger log = LoggerFactory
        .getLogger(URLBasedResourceResolver.class);

private final URI base;

private final Map<URI, String> nsmap;

public URLBasedResourceResolver(URL base, Map<URI, String> nsmap)
        throws URISyntaxException {
    super();
    this.base = base.toURI();
    this.nsmap = nsmap;
}

@Override
public LSInput resolveResource(String type, String namespaceURI,
        String publicId, String systemId, String baseURI) {
    if (log.isDebugEnabled()) {
        String msg = String
                .format("Resolve: type=%s, ns=%s, publicId=%s, systemId=%s, baseUri=%s.",
                        type, namespaceURI, publicId, systemId, baseURI);
        log.debug(msg);
    }
    if (type.equals(XMLConstants.W3C_XML_SCHEMA_NS_URI)) {
        if (namespaceURI != null) {
            try {
                URI ns = new URI(namespaceURI);
                if (nsmap.containsKey(ns))
                    return new MyLSInput(base.resolve(nsmap.get(ns)));
            } catch (URISyntaxException e) {
                // ok
            }
        }
    }
    return null;
}

}

The implementation of MyLSInput is really boring:

class MyLSInput implements LSInput {

private final URI url;

public MyLSInput(URI url) {
    super();
    this.url = url;
}

@Override
public Reader getCharacterStream() {
    return null;
}

@Override
public void setCharacterStream(Reader characterStream) {

}

@Override
public InputStream getByteStream() {
    return null;
}

@Override
public void setByteStream(InputStream byteStream) {

}

@Override
public String getStringData() {
    return null;
}

@Override
public void setStringData(String stringData) {

}

@Override
public String getSystemId() {
    return url.toASCIIString();
}

@Override
public void setSystemId(String systemId) {
}

@Override
public String getPublicId() {
    return null;
}

@Override
public void setPublicId(String publicId) {
}

@Override
public String getBaseURI() {
    return null;
}

@Override
public void setBaseURI(String baseURI) {

}

@Override
public String getEncoding() {
    return null;
}

@Override
public void setEncoding(String encoding) {

}

@Override
public boolean getCertifiedText() {
    return false;
}

@Override
public void setCertifiedText(boolean certifiedText) {

}

}
forty-two
  • 12,204
  • 2
  • 26
  • 36
  • Concerning the first link -- I am using xerces:2.10.0 as a standalone Maven dependency outside the JDK. The second link does not work. I do not mind using another API all together. – carlspring Feb 01 '12 at 14:07
  • Yeah, while this is indeed correct, I still don't have the location of the schema, therefore I cannot cache it and I need to implement caching. Therefore I really need to get a hold of the `schemaLocation` first. – carlspring Feb 01 '12 at 15:13
  • Sorry, didn't read question carefully enough. See second update. – forty-two Feb 01 '12 at 15:24
  • I think this is a step in the right direction. Would you happen to have an example of how to use this properly? – carlspring Feb 01 '12 at 15:28
  • Excuse me, but I don't seem to be getting this. You can set a `ResourceResolver` for the `SchemaFactory`. However, the `SchemaFactory` will give you an instance of the `Schema` which you can only use, if you know the `schemaLocation`. I don't have the `schemaLocation` in advance and would like to be getting it from the XML file. What am I missing here? – carlspring Feb 01 '12 at 22:15
  • Thanks! Figured it out with a little help! :) Your answer is indeed what I needed. – carlspring Feb 02 '12 at 11:47