1

I'm trying to generate the following XML using rdflib:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="#TNFalpha_944">
    <dcterms:modified rdf:parseType="Resource">
        <dcterms:W3CDTF rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-09-02T17:07:46.407152</dcterms:W3CDTF>
    </dcterms:modified>
  </rdf:Description>
</rdf:RDF>

However, I only manage to output the following result, which has an addition rdf:description between dcterms:modified and dcterms:W3CDTF. Also, it seems not possible to include rdf:parseType="Resource" on the <dcterms:modified> tag.

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="#TNFalpha_944">
    <dcterms:modified>
      <rdf:Description rdf:nodeID="N947975008f3148c88ca6d2e3fd93f58f">
        <dcterms:W3CDTF rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-09-02T17:07:46.407152</dcterms:W3CDTF>
      </rdf:Description>
    </dcterms:modified>
  </rdf:Description>
</rdf:RDF>

The code to reproduce my issue is as follows:

import rdflib
from rdflib.namespace import Namespace, DCTERMS, RDF, XSD
from datetime import datetime

graph = rdflib.Graph()

graph.bind('dcterms', DCTERMS)
graph.bind('xsd', XSD)
description = rdflib.URIRef(f'#TNFalpha_944')

w3cdtf_node = rdflib.BNode()

date = rdflib.Literal(datetime.now(), datatype=XSD.dateTime)
graph.add((description, DCTERMS.modified, w3cdtf_node))
graph.add((w3cdtf_node, DCTERMS.W3CDTF, date))

ann = graph.serialize(format="pretty-xml").decode('utf-8')
print(ann)

I have the impression I'm missing something really obvious but after few hours going through rdflib documentation and other forums, I can't manage to get rid of this second rdf:description tag. What am I missing?

Thank you very much in advance.

Update:

I think this has something to do with omitting blank nodes:

https://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-parsetype-resource

So, since is not possible to add the attribute property rdf:parseType="Resource" to <dcterms:modified>, rdflib generates an extra level of rdf:Description. Just like in this other example here:

http://etutorials.org/Misc/Practical+resource+description+framework+rdf/Chapter+9.+RDF+and+Perl+PHP+and+Python/9.3+RDF+and+Python+RDFLib/

I wonder if is a limitation from the library or if there is a proper way to code this to generate the right output.

Carlos Vega
  • 1,341
  • 2
  • 13
  • 35
  • 1
    For what it's worth: both documents represent the exact same RDF model. It's just written down slightly differently, but there is no change in the meaning of your data. – Jeen Broekstra Sep 03 '20 at 08:48
  • Yes, technically should represent the same information, but I wonder if there is any way to prevent this from happening, like other libraries in other languages seem to do. – Carlos Vega Sep 03 '20 at 12:35

0 Answers0