print rdflib.Graph using serialize() in the same layout

Question

I'm having the following problem when using rdflib serialize() method to print the graph. The layout changes from the original file used to create the graph.

The code is as follows

from rdflib import Graph
mapping_graph = Graph().parse("valid_mapping.ttl", format="ttl")
print(mapping_graph.serialize(format="ttl").decode("utf-8"))

Which outputs

<file:///home/alex/Desktop/Mapping-Quality-Framework/Mapping-Quality-Model/valid_mapping.ttl#TripleMap1>  rr:logicalTable [ rr:tableName "people" ] ;
    rr:predicateObjectMap [ rr:objectMap [ rr:column "publications" ;
                    rr:language "en-GB" ] ;
            rr:predicate foaf:publications ;
            rr:termType rr:Literal ],
        [ rr:objectMap [ rr:column "age" ;
                    rr:datatype xsd:second ] ;
            rr:predicate foaf:age ],
        [ rr:objectMap [ rr:column "age" ;
                    rr:datatype xsd:third ;
                    rr:language "dhhdhd" ] ;
            rr:predicate dbo:equipment ] ;
    rr:subjectMap [ rr:class foaf:ggg ] .

While the input file is

<#TripleMap1>
    rr:logicalTable [ rr:tableName "people" ] ;
    rr:subjectMap [ rr:class foaf:ggg ];
    rr:predicateObjectMap [   rr:predicate foaf:publications ;
                              rr:termType rr:Literal;
                              rr:objectMap [ rr:column "publications" ;
                                           rr:language "en-GB" ] ;
                            ];
    rr:predicateObjectMap
        [   rr:predicate foaf:age;
            rr:objectMap [ rr:column "age" ;
                         rr:datatype xsd:second ] ;
            ];
    rr:predicateObjectMap
        [   rr:predicate dbo:equipment;
            rr:objectMap [ rr:column "age" ;
                    rr:datatype xsd:third;
                         rr:language "dhhdhd"] ; ] ;
.

The layout of the graph is changed by the serialize() method.

Any help would be gratefully appreciated.

what do you mean by problem? This is valid Turtle syntax and rdflib tries to use the most compact form and makes use of Turtle language features. In Turtle a simple comma between objects of a triple can be used if they share the same subject and predicate. — UninformedUser, Oct 02 '20 at 10:30
This is a problem for my use case as the graph file is uploaded and when its returned to the user, the layout has changed. This may confuse them when examining the changes that have been made. — Alex, Oct 02 '20 at 10:37
I see, but I guess then you're a bit lost with Turtle. I mean, it's one of the fancy and nice features of Turtle to have a compact notation and now you want just to have some serializer that does make use only of parts of it, i.e. you want to have the shared subject notation by comma delimiter but not the shared subject predicate shortcut. I doubt you can force `rdflib` to do this - but, it's open source, you could easily adapt the code and hopefully just "disable" the compact notation of multiple objects with same subject/predicate. — UninformedUser, Oct 02 '20 at 14:58

score 1 · Answer 1 · answered Oct 23 '20 at 02:44

The comments by @UninformedUser are correct: you're asking for something that the Turtle syntax wasn't designed for. I've seen this issue - about different forms of serialization confusing people - come up a few times. Turtle isn't like JSON or even XML and other formats which can be sorted in a particular way. This is because, fundamentally, there is no ordering in RDF graphs. It is not possible to know, and thus repeatedly use, a single order for peer Blank Nodes for instance.

Your various Turtle files are isomorphic which, in graph terms, is as equal as things get!

One semi-solution is to implement a semi-deterministic serializer that orders things in particular ways, but this will always make assumptions about Blank Node IDs and so on. You could make such a serializer on top of RDFlib's serializer that takes in the RDFlib-serialized file - Turtle or N3 etc - and sorts it in some way. I've personally implemented such a sorter previously for Git diffing and sorted the Blank Nodes by a hash of their property values. You could rely on this for a specific scenario but perhaps not as a serializer for data in general.

You could also look at ways of communicating RDF data to your users that isn't static Turtle structure-dependent. You could write a small function that counts things in your graphs and reports on that basis for comparison, e.g.:

1 x rr:logicalTable 1 x rr:subjectMap ... 2 x rr:predicateObjectMap

Or, a more domain-specific thing:

list the rr:tableName & rr:column values from your data in some fixed format that allows for easier comparison.

Some scenario-specific reporting, rather than general Turtle, is my ultimate suggestion.

A more general approach, but harder, could be to use a constraints testing system like SHACL to inspect small graphs, like your Turtle files, and present/order/validate them in certain ways. SHACL has a presentation bent to it, not just validation, which is the main use case for it.

print rdflib.Graph using serialize() in the same layout

1 Answers1