1

Using Jena to deserialize RDF that includes blank nodes results in unique IDs for those nodes each time the same RDF is deserialized. If identical RDF is deserialized multiple times and merged, the blank nodes become duplicated. Is there a way to avoid or remove the duplication?

static final String RDF =
        "<http://www.foo.com/subject>" +
            "<http://www.foo.com/predicate>" +
                "[ a  <http://www.foo.com/bar> , <http://www.foo.com/baz> ] .";

public static void main(String... args) {
    Model m1 = ModelFactory.createDefaultModel().read(new StringReader(RDF), null, "ttl");
    Model m2 = ModelFactory.createDefaultModel().read(new StringReader(RDF), null, "ttl");
    Model m3 = m1.union(m2);
    RDFDataMgr.write(System.out, m3, Lang.TURTLE);
}

//<http://www.foo.com/subject>
//    <http://www.foo.com/predicate>  [ a  <http://www.foo.com/bar> , <http://www.foo.com/baz> ] ;
//    <http://www.foo.com/predicate>  [ a  <http://www.foo.com/bar> , <http://www.foo.com/baz> ] .

This contrived example is a bit silly, but consider that I'm trying to merge RDF files that may or may not be identical.

jaco0646
  • 15,303
  • 7
  • 59
  • 83
  • These triples are not "identical". Using other output serialization, you can see that these blank nodes have different blank nodes labels. See [this answer](https://stackoverflow.com/a/44498034/7879193) for some explanation. As for workaround, probably you could use Jena-specific blank nodes [pseudo-URIs](https://jena.apache.org/documentation/query/extension.html#blank-node-labels) in your input serialization. AFAIK, merging isomorphic RDF (sub)graphs is [hard](http://blog.datagraph.org/2010/03/rdf-isomorphism). – Stanislav Kralin Sep 09 '17 at 09:19
  • I have tried modifying the above code to use NTriples rather than Turtle. The blank node labels are then visible in the serialization; however, the labels are still deserialized to new labels each time. – jaco0646 Sep 10 '17 at 15:27

0 Answers0