1

Table of contents

  • The context
  • What I want to do
  • Why I want to do this?

The context

I know how to generate a Turtle file using Python and rdflib. See minimal working example below. It generates a file called output.txt in Turtle format.

import rdflib

g = rdflib.Graph()

g.add((rdflib.URIRef('http://example.org/my_subject_1'),
       rdflib.URIRef('http://example.org/my_predicate_1'),
       rdflib.URIRef('http://example.org/my_object_1')))

g.add((rdflib.URIRef('http://example.org/my_subject_1'),
       rdflib.URIRef('http://example.org/my_predicate_2'),
       rdflib.URIRef('http://example.org/my_object_1')))

g.serialize('output.txt', format='turtle')
$ source venv/bin/activate
$ python main.py
$ cat output.txt
@prefix ns1: <http://example.org/> .

ns1:my_subject_1 ns1:my_predicate_1 ns1:my_object_1 ;
    ns1:my_predicate_2 ns1:my_object_1 .

What I want to do

There are some changes that I'd like to do to the output of serialize.

  1. Make every pair of predicate and object to be shown in a separated line. That is, the output should look as in the code block below.
@prefix ns1: <http://example.org/> .

ns1:my_subject_1
    ns1:my_predicate_1 ns1:my_object_1 ;
    ns1:my_predicate_2 ns1:my_object_1 .
  1. Make every pair of predicate and object have an indentation of two spaces, The default is 4 spaces. The output should look as shown in the code block below.
@prefix ns1: <http://example.org/> .

ns1:my_subject_1
  ns1:my_predicate_1 ns1:my_object_1 ;
  ns1:my_predicate_2 ns1:my_object_1 .
  1. Remove the space character after objects in subject-predicate-object triples.
@prefix ns1: <http://example.org/>.

ns1:my_subject_1
  ns1:my_predicate_1 ns1:my_object_1;
  ns1:my_predicate_2 ns1:my_object_1.

In summary, I would like serialize to generate the output as shown in the last code block.

Why I want to do this?

I'm generating some turtle files containing a lot of information. Sometimes I'll need to edit those files manually, so I wish them to have an structure that I feel it's more readable to me.

rdrg109
  • 265
  • 1
  • 8

1 Answers1

0

Create a class that inherits the TurtleSerializer and edit the methods as you wish.

$ cat turtle_custom/serializer.py
from rdflib.plugins.serializers.turtle import TurtleSerializer

SUBJECT = 0
VERB = 1
OBJECT = 2

class TurtleSerializerCustom(TurtleSerializer):
    indentString = "  "

    # Remove trailing space between prefix definitions and period at
    # the end of the line.

    def startDocument(self):
        self._started = True
        ns_list = sorted(self.namespaces.items())

        if self.base:
            self.write(self.indent() + "@base <%s>.\n" % self.base)
        for prefix, uri in ns_list:
            self.write(self.indent() + "@prefix %s: <%s>.\n" % (prefix, uri))
        if ns_list and self._spacious:
            self.write("\n")

    # Remove trialing space between objects in
    # subject-predicate-object triples in the pair of a group of
    # object-predicate statements.

    def s_default(self, subject):
        self.write("\n" + self.indent())
        self.path(subject, SUBJECT)
        self.predicateList(subject)
        self.write(".")
        return True

    # Make first pair of predicate-object to be shown in a separated
    # line.

    def predicateList(self, subject, newline=False):
        properties = self.buildPredicateHash(subject)
        propList = self.sortProperties(properties)
        if len(propList) == 0:
            return
        self.write("\n" + self.indent(1))
        self.verb(propList[0], newline=True)
        self.objectList(properties[propList[0]])
        for predicate in propList[1:]:
            self.write(";\n" + self.indent(1))
            self.verb(predicate, newline=True)
            self.objectList(properties[predicate])


$ cat test.py
import rdflib

rdflib.plugin.register('turtle_custom', rdflib.plugin.Serializer, 'turtle_custom.serializer', 'TurtleSerializerCustom')

g = rdflib.Graph()

g.add((rdflib.URIRef('http://example.org/my_subject_1'),
       rdflib.URIRef('http://example.org/my_predicate_1'),
       rdflib.URIRef('http://example.org/my_object_1')))

g.add((rdflib.URIRef('http://example.org/my_subject_1'),
       rdflib.URIRef('http://example.org/my_predicate_2'),
       rdflib.URIRef('http://example.org/my_object_1')))

g.serialize('output.txt', format='turtle_custom')
$ python test.py
$ cat output.txt
@prefix ns1: <http://example.org/>.

ns1:my_subject_1
  ns1:my_predicate_1 ns1:my_object_1;
  ns1:my_predicate_2 ns1:my_object_1.
rdrg109
  • 265
  • 1
  • 8