3

I have an rdf file, for example:

<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dbp="http://dbpedia.org/ontology/"
xmlns:dbprop="http://dbpedia.org/property/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <rdf:Description rdf:about="http://dbpedia.org/page/Johann_Sebastian_Bach">
      <dbp:birthDate>1685-03-21</dbp:birthDate>
      <dbp:deathDate>1750-07-28</dbp:deathDate>
      <dbp:birthPlace>Eisenach</dbp:birthPlace>
      <dbp:deathPlace>Leipzig</dbp:deathPlace>
      <dbprop:shortDescription>German composer and organist</dbprop:shortDescription>
      <foaf:name>Johann Sebastian Bach</foaf:name>
      <rdf:type rdf:resource="http://dbpedia.org/class/yago/GermanComposers"/>
      <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
    </rdf:Description>
</rdf:RDF> 

and I'd like to extract only the textual parts of this file, i.e., my output in this case would be:

output_ tex = "Johann Sebastian Bach, German composer and organist,1685-03-21, 1750-07-28, Eisenach, Leipzig"

How can I get this result using RDFlib?

Marcelo
  • 438
  • 5
  • 16
  • Is the order of the text important? Strings (in general, literals) will only appear as the objects of RDF statements; they can't be subjects or predicates. So you can simply iterate through the statements of the model, and concatenate (with ", " interspersed, it seems) the string literals that occur as objects of the statements. Does this sound like what you're looking for? – Joshua Taylor Oct 11 '13 at 12:10
  • Thanks Joshua! The order of the text is not important. The commas in the output are just to separate one literal from another, but I don't need it, too. Yes, I realize that only the objects (Literals) contain the strings I'm looking for, but I didn't get how can I extract these strings from the object and put them into a string variable. Could you give me an example? – Marcelo Oct 11 '13 at 12:32

2 Answers2

8

Building on Joshua Taylor's answer, the method you are looking for is "toPython" which the docs say " Returns an appropriate python datatype derived from this RDF Literal ". This snippet should return what you are looking for:

raw_data = """<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dbp="http://dbpedia.org/ontology/"
xmlns:dbprop="http://dbpedia.org/property/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <rdf:Description rdf:about="http://dbpedia.org/page/Johann_Sebastian_Bach">
      <dbp:birthDate>1685-03-21</dbp:birthDate>
      <dbp:deathDate>1750-07-28</dbp:deathDate>
      <dbp:birthPlace>Eisenach</dbp:birthPlace>
      <dbp:deathPlace>Leipzig</dbp:deathPlace>
      <dbprop:shortDescription>German composer and organist</dbprop:shortDescription>
      <foaf:name>Johann Sebastian Bach</foaf:name>
      <rdf:type rdf:resource="http://dbpedia.org/class/yago/GermanComposers"/>
      <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
    </rdf:Description>
</rdf:RDF>"""
import rdflib
graph = rdflib.Graph()
graph.parse(data=raw_data)

output = []

for s, p, o in graph:
    if type(o) == rdflib.term.Literal:
        output.append(o.toPython())

print ', '.join(output)
Community
  • 1
  • 1
Ted Lawless
  • 830
  • 6
  • 6
4

This is relatively straightforward, at least in terms of the conceptual task. You need to

  • read the RDF document into an rdflib Graph
  • iterate through the statements (triples) in the graph
    • if the statement's object is a literal
    • then concatenate the lexical form of the literal into the string that you're building

I'm not much of a Python user, and so not much an RDFlib user, either, but these shouldn't be all that difficult. Getting started with RDFLib (from the RDFlib documentation) shows how you can read a graph and iterate over the triples

import rdflib

g = rdflib.Graph()
result = g.parse("http://www.w3.org/People/Berners-Lee/card")

# Iterate over triples in store and print them out.
print("--- printing raw triples ---")
for s, p, o in g:
    print((s, p, o))

Now, instead of print((s,p,o)) in that for body, you'll need to check whether o is a literal (an instance of rdflib.term.Literal). If there are literals of non-string types, you will either want to concatenate their lexical forms, or only concatenate plain literals (literals with no language type, and no datatype), the string part of literals with language tags, and the lexical form of literals whose datatype is xsd:string.

More references

Community
  • 1
  • 1
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353