0

I have the following class:

package com.somedir.someotherdir;

import java.util.logging.Level;
import java.util.logging.Logger;

import javax.xml.XMLConstants;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

public class SchemaValidator
{
 private static Logger _logger = Logger.getLogger(SchemaValidator.class.getName());

 /**
  * @param file - the relative path to and the name of the XML file to be validated
  * @return true if validation succeeded, false otherwise
  */
 public final static boolean validateXML(String file)
 {
  try
  {
   SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
   Schema schema = factory.newSchema();
   Validator validator = schema.newValidator();
   validator.validate(new StreamSource(file));
   return true;
  }
  catch (Exception e)
  {
   _logger.log(Level.WARNING, "SchemaValidator: failed validating " + file + ". Reason: " + e.getMessage(), e);
   return false;
  }
 }
}

I would like to know if I should use schema.newValidator("dir/to/schema.xsd") after all or is the current version alright? I read that there's some DoS vulnerability, maybe someone could provide more info on that? Also, does the path have to be absolute or relative?
Most of the XMLs to be validated each have their own XSD, so I'd like to read the schema that is mentioned in the XML itself (xs:noNamespaceSchemaLocation="schemaname.xsd").
The validation is done only during startup or manual reload (server software).

jurchiks
  • 1,354
  • 4
  • 25
  • 55
  • Could you people stop reformatting my code and give some answers?? – jurchiks Jan 01 '11 at 16:14
  • "Most of the XMLs to be validated each have their own XSD, so I'd like to read the schema that is mentioned in the XML itself" maybe this is related: http://stackoverflow.com/questions/2829105/validating-xml-with-multiple-xsds-in-java – Cephalopod Jan 01 '11 at 17:36
  • Related maybe, but not what I need. I only need to validate it and I'm asking if THIS way is good or does it need any fixing. Your code requires more classes/methods than just this one and that's not exactly what I'm looking for. – jurchiks Jan 01 '11 at 19:52
  • @Alberto - "proper" is a subjective term. I have my code style and I don't like it when others touch it for whatever reason. Besides, it is not important in this question. – jurchiks Jul 22 '14 at 12:46

2 Answers2

1

As I interpret it, the javax.xml.validation.Schema object returned by SchemaFactory.newSchema() will try to fetch other schemas referred in the xml/xsd files to validate as indicated in the corresponding xsi:schemaLocation attributes. This implies that:

  1. If your schemas refer to schemas hosted in the internet, the Schema object will try to fetch them during runtime. As long as I'm aware, the default Schema implementation does not cache those schemas. The W3C already reported on bad coding practices resulting in de-facto DDoS to their website (up to 130M dtd requests per day!).
  2. If you are going to validate external uncontrolled xml files, then you are also exposed to the Schema trying to fetch other schemas from "possibly bad intended" xml sources.

For more evil attack vectors, take a look into sign's previous answer

To avoid this pitfall, you can store all external resources locally and use the SchemaFactory.setResourceResolver method to instruct the Schema how to fetch them.

Community
  • 1
  • 1
Alberto
  • 5,021
  • 4
  • 46
  • 69
  • Way to necro-post... I don't have such concerns over my XMLs as the only XSDs they have are in the folder next to them and are written by me. – jurchiks Jul 21 '14 at 19:00
  • With necro-post you mean your original post is already answered? Perhaps you could include the resolution? I find your point about DoS very interesting, and still applicable... – Alberto Jul 22 '14 at 06:19
  • you explained the DoS thing yourself. And the main question still hasn't been answered, but I'd long forgotten about it until you necro-posted. There is no reason to think that after 3 years I would still be interested in an answer to a question like this. – jurchiks Jul 22 '14 at 12:49
  • I see. I somehow have also missed the core question. But I guess you meant if you should use `SchemaFactory.newSchema(Source[] schemas)` instead of `schema.newValidator(String id)`? You don't need to answer. Your question is still of interest to me, and just would like to improve it / see it improved for future reference. The title is too vague though (there are too many general "_XML validation using XSD_" questions here). – Alberto Jul 22 '14 at 13:44
  • `I would like to know if I should use schema.newValidator("dir/to/schema.xsd") after all or is the current version - just schema.newValidator() - alright?`. ..... `SchemaFactory.newSchema(Source[] schemas) instead of schema.newValidator(String id)` - that doesn't make any sense, schema != validator. – jurchiks Jul 23 '14 at 13:23
  • Obviously. The point is that there is no `Schema.newValidator` with that signature. It is possible thought to get an `Schema` from one (or more) xsd with the `SchemaFactory` API, and from that schema, a `validator` – Alberto Jul 25 '14 at 20:39
  • I have no idea what you're talking about - http://docs.oracle.com/javase/8/docs/api/javax/xml/validation/Schema.html#newValidator-- – jurchiks Jul 27 '14 at 17:07
  • There is a `Schema.newValidator()` (the one you link), but no `Schema.newValidator(String)`. From your comments, I'm afraid I have completely misunderstood the focus of your question. But at this point, I have also lost the interest in it. – Alberto Jul 28 '14 at 06:46
  • Yeah, I have no idea where Eclipse got that signature from (I used Eclipse when I was writing this)... But it's been 3 years after all, and idgaf anymore. – jurchiks Jul 28 '14 at 14:46
1

Are you really meaning XML DTD DOS attack? If so, there are some good articles on the net:

XML Denial of Service Attacks and Defenses http://msdn.microsoft.com/en-us/magazine/ee335713.aspx

From IBM developerWorks. "Tip: Configure SAX parsers for secure processing":

Entity resolution opens a number of potential security holes in XML.[...]
- The site where the external DTD is hosted can log the communication. [...]
- The site that hosts the DTD can slow the parsing [...] It can also stop the parse completely by serving a malformed DTD.
- If the remote site changes the DTD, it can use dafault attribute values to inject new content into the document[...] It can change the content of the document by redefining entity references.

Thought I am not sure that it can be directly applied to your program, it can give some clues for further investigation

Alberto
  • 5,021
  • 4
  • 46
  • 69
sign
  • 389
  • 1
  • 3
  • 8
  • I mean this: http://download.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/SchemaFactory.html#newSchema() Note that the use of schema location hints introduces a vulnerability to denial-of-service attacks. – jurchiks Jan 02 '11 at 19:54
  • +1 The first link is a very good reading. Unfortunately, the second link (@ibm) doesn't work anymore. – Alberto Jul 21 '14 at 08:54