1

I want to write an rdf4j.model.Model with the rdf/turtle format. The model should contain IRIs with the characters {}.

When I try to write the RDF model with rdf4j.rio.Rio, the {} characters are written as %7B%7D. Is there a way to overcome this? e.g. create an rdf4j.model.IRI with path and query variables or configure the writer to preserve the {} characters?

I am using org.eclipse.rdf4j:rdf4j-runtime:3.6.2.

An example snippet:

import org.eclipse.rdf4j.model.BNode;
import org.eclipse.rdf4j.model.IRI;
import org.eclipse.rdf4j.model.Model;
import org.eclipse.rdf4j.model.impl.SimpleValueFactory;
import org.eclipse.rdf4j.model.util.ModelBuilder;
import org.eclipse.rdf4j.rio.*;
import org.eclipse.rdf4j.rio.helpers.BasicWriterSettings;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.logging.Level;
import java.util.logging.Logger;

public class ExamplePathVariable {

    private final static Logger LOG = Logger.getLogger(ExamplePathVariable.class.getCanonicalName());
    public static void main(String[] args) {

        SimpleValueFactory rdf = SimpleValueFactory.getInstance();
        ModelBuilder modelBuilder = new ModelBuilder();

        BNode subject = rdf.createBNode();
        IRI predicate = rdf.createIRI("http://example.org/onto#hasURI");

        // IRI with special characters !
        IRI object = rdf.createIRI("http://example.org/{token}");

        modelBuilder.add(subject, predicate, object);

        String turtleStr = writeToString(RDFFormat.TURTLE, modelBuilder.build());
        LOG.log(Level.INFO, turtleStr);
    }

    static String writeToString(RDFFormat format, Model model) {
        OutputStream out = new ByteArrayOutputStream();

        try {
            Rio.write(model, out, format,
                    new WriterConfig().set(BasicWriterSettings.INLINE_BLANK_NODES, true));
        } finally {
            try {
                out.close();
            } catch (IOException e) {
                LOG.log(Level.WARNING, e.getMessage());
            }
        }

        return out.toString();
    }
}

This is what I get:

INFO: 
[] <http://example.org/onto#hasURI> <http://example.org/%7Btoken%7D> .
stln
  • 23
  • 4
  • I haven't got a clue what you're doing here, but I can tell you that the URI you've got there contains those curly braces encoded correctly, so they *are* being preserved probably. – g00se Jul 11 '21 at 12:52
  • Thanks, they are being preserved and there would not be an issue if the Rio parser could then treat %7B, %7D as {,} when reading a turtle file that contains `http://example.org/%7Btoken%7D` (at least in a closed application). But since this is not the case (at least without the knowledge of configuring the Rio parser for doing so), I'm still looking for a way for generating a .ttl file that actually shows the characters {,}. I could do the replacement manually, but I'm wondering if there is a more elegant way through rdf4j. – stln Jul 11 '21 at 14:09
  • Also, I'm sorry if the question is not very clear. If there is more info I can give, I'll gladly do so. For example, on the [`NTriplesWriterSettings`](https://rdf4j.org/javadoc/latest/org/eclipse/rdf4j/rio/helpers/NTriplesWriterSettings.html), there is an `ESCAPE_UNICODE` setting. But there is no such setting for the `TurtleWriter` nor the `BasicWriter`. – stln Jul 11 '21 at 14:14
  • > if the Rio parser could then treat %7B, %7D as {,} when reading a turtle file< Well how is that actually going, since the code above is just dumping what has been written? – g00se Jul 11 '21 at 14:44

1 Answers1

1

There is no easy way to do what you want, because that would result in a syntactically invalid URI representation in Turtle.

The characters '{' and '}', even though they are not actually reserved characters in URIs, are not allowed to exist in un-encoded form in a URI (see https://datatracker.ietf.org/doc/html/rfc3987). The only way to serialize them legally is by percent-encoding them.

As an aside the only reason this bit of code:

IRI object = rdf.createIRI("http://example.org/{token}");

succeeds is that the SimpleValueFactory you are using does not do character validation (for performance reasons). If you instead use the recommended approach (since RDF4J 3.5) of using the Values static factory:

IRI object = Values.iri("http://example.org/{token}");

...you would immediately have gotten a validation error.

If you want to input a string where in advance you don't know if it's going to contain any invalid chars, and want to have a best-effort approach to convert it to a legal URI, you can use ParsedIRI.create:

IRI object = Values.iri(ParsedIRI.create("http://example.org/{token}").toString());
Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73
  • 1
    Thanks a lot. I was genuinely confused because I was expecting that these characters do not need to be %-encoded since this is an IRI (and not just any URI). I should instead search for a library that supports URI templates. – stln Jul 18 '21 at 19:21