0

I created a file system that stores metadata of files and folders in an owl file.

For file system, I am using java binding of FUSE i.e. FUSE-JNA

For OWL, I am using Jena:

Initially my file system runs ok with no error. But after sometime my program stops reading .owl file and throws some errors. One of the error is below:

Errors I get while reading .owl file:

SEVERE: Exception thrown: org.apache.jena.riot.RiotException: [line: 476, col: 52] The value of attribute "rdf:about" associated with an element type "File" must not contain the '<' character.
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)
org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.fatalError(LangRDFXML.java:252)
com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:48)
com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:209)
com.hp.hpl.jena.rdf.arp.impl.XMLHandler.fatalError(XMLHandler.java:239)
org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
...

I open my .owl file, what I found is the Jena is not writing correctly. In picture below if you see number 3 highlighted error in blue color, its incomplete, there is some code missing there.

Secondly, number 2 blue highlighted error is also written wrongly.In my ontology is property of File. It should be written as of number 1 blue highlighted code.

Although both the number 1 and number 2 code is written by jena. Most of the owl code is written correctly by Jena as similar to number 1 but some time jena writes it wrongly as similar to number 2 in picture. I do not know why.

(to see the picture in full size, open it in new tab or save it on your computer) errors in owl files

This is how I am writing to .owl file using jena api:

public void setDataTypeProperty(String resourceURI, String propertyName, String propertyValue) //create new data type property. Accept four arguments: URI of resource as string, property name (i.e #hasPath), old value as string and new value as string.
{
    Model model = ModelFactory.createDefaultModel();


//read model from file
InputStream in = FileManager.get().open(inputFileName);

 if (in == null) 
 {
     throw new IllegalArgumentException( "File: " + inputFileName + " not found");
 }       
 model.read(in, "");
 try {
    in.close();
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}


     // Add property to Model
     Resource resource = model.createResource(resourceURI);
     resource.addProperty(model.createProperty(baseURI+propertyName), model.createLiteral(propertyValue));


     //Writing model to file
        try {
            FileWriter out = new FileWriter( inputFileName );
            model.write( out, "RDF/XML-ABBREV" );
            out.close();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
}

Please guide me how to fix the number 2 and number 3 blue highlighted errors of Jena.

  • The highlighting in the text is nice, but the image is rather small print, too. Copy and paste the important text into the question, please. – Joshua Taylor Jul 16 '14 at 13:56
  • 1
    Can you provide a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) that we can use to reproduce the problem? You provided relevant code, which helps, but can you provide a complete working example that reproduces this behavior? – Joshua Taylor Jul 16 '14 at 14:00
  • @JoshuaTaylor You are correct that the text should be in the question. As a short term workaround, however, you can (in Firefox at least) right click the image and select `View Image` to see a larger version of it. The image is sufficient resolution to read when enlarged. – Rob Hall Jul 16 '14 at 14:09
  • Is there any chance that you have any unexpected characters in your text that might not show up in the editor that you're showing us? Perhaps a [right to left mark](http://en.wikipedia.org/wiki/Right-to-left_mark)? – Joshua Taylor Jul 16 '14 at 14:20
  • Do you have any way to guesstimate how much is missing after #3 - "admi"? Or might it even be duplicated? Also w/r/t #2: do you know which File it's supposed to be on? – Ed Staub Jul 16 '14 at 21:38

3 Answers3

1

There is an issue of input-sanitation to your method. I cannot be certain that your input data is invalid, but it is certainly something that should be tested in any method that is programmatically constructing URIs or literals.

URIs

For example, the following two lines are dangerous because they can allow characters that are not allowed in a URI, or they can allow characters for literal values that cannot be serialized as XML.

Resource resource = model.createResource(resourceURI);
resource.addProperty(model.createProperty(baseURI+propertyName), model.createLiteral(propertyValue));

To fix the problem of URIs, use URLEncoder to sanitize the uris themselves:

final String uri  = URLEncoder.encode(resourceURI, "UTF-8");
final String puri = URLEncoder.encode(baseURI+propertyName);
final Resource resource = model.createResource(uri);
resource.addProperty(model.createProperty(puri), model.createLiteral(propertyValue));

To test for the problem us URIs, you can use Jena's IRIFactory types in order to validate that the URI you are constructing adheres to some particular specification.

Literals

To solve the problem of literals is a little more tricky. You are not getting an exception that indicates that you have a bad value for a literal, but I am including this for completeness (so you can sanitize all inputs, and not only the ones that may be causing a problem now).

Jena's writers do not test the values of literals until they are being serialized as XML. The pattern that they use to detect invalid XML characters is focused only on the characters that are required to replace as part of the RDF XML specification. Jena delegates the final validation (and exception throwing) to the underlying XML library. This makes sense, because there could exist a future RDF serialization that allows the expression of all characters. I was recently bit by it (for example, a string that contains a backspace character), so I created a more strict pattern in order to eagerly detect this situation at runtime.

final Pattern elementContentEntities = Pattern.compile( "[\0-\31&&[^\n\t\r]]|\127|[\u0080-\u009F]|[\uD800-\uDFFF]|\uFFFF|\uFFFE" );
final Matcher m = elementContentEntities.matcher( propertyValue );
if( m.find() ) {
    // TODO sanitise your string literal, it contains invalid characters
} 
else {
    // TODO your string is good.
}
Rob Hall
  • 2,693
  • 16
  • 22
  • 1
    **I was recently bit by it (for example, a string that contains a backspace character)** I was just starting to wonder if perhaps the literal as a carriage return in it. Or even more interesting, and I'm making some possible assumptions based on OP's username and some of the text in the question, if there might be a [right to left mark](http://en.wikipedia.org/wiki/Right-to-left_mark) in the text. – Joshua Taylor Jul 16 '14 at 14:19
1

The nature of the truncation at #3 - "admi" - leads me to think that maybe this is a problem with your underlying data transport and storage, and has nothing to do with XML, RDF, Jena, or anything else up at this level. Maybe an ignored exception?

Ed Staub
  • 15,480
  • 3
  • 61
  • 91
0

My main program was some times passing resourceURI argument as blank/null to setDataTypeProperty method. That's why it was creating problem.

So I have modified my code and added two lines at start of the method:

    public void setDataTypeProperty(String resourceURI, String propertyName, String propertyValue) //create new data type property. Accept four arguments: URI of resource as string, property name (i.e #hasPath), old value as string and new value as string.
{
    if (resourceURI==null)
    return;
...
...

Now I am running it since few days but did not face the above mentioned errors yet.