0

I've a RDF that has multiple resources in it that I'm generating from my data model. Because each resource is added (concatenated) separately, I've multiple prefixes (when in N3). It looks something like this:

@prefix dc: <someURL>.

<someURL/Tony_Benn>
     dc:title "Tony Benn";
     dc:publisher "Wikipedia".

@prefix dc: <someURL>.

<someURL/Someone_Else>
     dc:title "Someone Else";
     dc:publisher "Wikipedia".

I am using Jena API to create the RDF but I've written a wrapper around the API to keep it disjoint. Is there a better way to approach this problem or is there a way to remove the duplicate prefixes?

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353

2 Answers2

1

If you're using a utility (e.g., Jena's rdfcat to concatenate the RDF documents, then you have nothing to worry about. Prefixes just make reading and writing a little easier, but RDF-aware tools don't really care. If being able to concatenate data with text-based tools (i.e., tools that aren't RDF-aware) is important, then you should probably use the N-Triples format. It is very simple, just

subject predicate object .

with one triple per line. Since there is no provision for prefixes, text concatenation simply works. N-Triples also has the (even nicer) feature that if you need to split up a document, e.g., for distributed processing, you can just split the file, as long as you split at linebreaks. That's impossible with N3, RDF/XML, and other more complicated formats.

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Thanks for the quick response. Yes, I am aware of the N-Triples format. My bad; should have mentioned the whole context of the work, which is to be able to visualise the RDF in JSON. Right now, the tools (javascript) that consume RDF/XML & produce JSON are not standardised, hence better readable format of Notation3. Yes, I do agree that N-Triple is much easier to read that N3 and I would appreciate your pointers to any tools or API that help converting RDF to JSON. P.S: I have to programmatically do the conversion, hence cannot use rdfcat. – Narayanan Krishnan May 23 '13 at 14:09
  • @narayanankrish The `--help` option doesn't mention it, but you can use the `-out RDF/JSON` with rdfcat, so that's a very simple solution. I don't see why doing something programmatically precludes the use of rdfcat, but if you can't use rdfcat, Jena is open source, so you can still look at the source of rdfcat for some examples. Jena supports [writing RDF/JSON (distinct from JSON-LD)](http://jena.apache.org/documentation/io/), so you if you simply read the RDF in, you can write it back out as RDF/JSON. – Joshua Taylor May 23 '13 at 14:44
  • @NarayananKrishnan I see that you accepted this answer a few days ago, and just unaccepted it. Did you encounter a problem with it such that it doesn't meet your needs? – Joshua Taylor May 24 '13 at 14:07
1

Thanks @Joshua. I thought about it. Rather than removing the duplicate entries, I think its better to not have it at the first place. Rather than concatenating two RDF documents, I found it better to make a union of respective models. Hence, here is what I did:

  • Read the documents into models
  • Made a union of the models. This could be done using the union(Model model) method OR better
  • Read, using read(.. ,.. ,..) method, the first RDF file (because I had it as a string, read it as an inputstream) into a model and add the statements from the second one. As @Joshua suggested in the below comment, it is much more efficient in memory usage.
  • Get the unified model out
  • I found this much more easier, predictable and handled the prefixes much better. I could do with Notation3 as well.

    • Ah, so you took the advice in the comment and decided to do what rdfcat does (see [rdfcat.java](https://github.com/apache/jena/blob/trunk/jena-core/src/main/java/jena/rdfcat.java)), i.e., read models, then write a combined model. Note that rather than using `union(Model)`, you might consider using just one model and `Model#read(…)` to _add_ the contents from the files into that single model. That should use about half as much memory (union creates a completely _new_ model). If you need separate models for some other reason, you can use an OntModel with sub-models. Glad you found a solution! – Joshua Taylor May 24 '13 at 14:24