0

I want to extract all statements from DBPedia's dump file.

Is it possible to write a sparql query to extract the list of predicates which contains the date values (like releaseDate, deathDate, birthDate,...)?

Amir Pournasserian
  • 1,600
  • 5
  • 22
  • 46

1 Answers1

1

You can write a SPARQL query (you tagged with SPARQL, so presumably that's how you want to query for these things) that finds these kind of properties. All you need to do is query for things which are owl:DatatypeProperties (since dates should be literals), and then filter based on their string representation. For instance:

select ?p where {
  ?p a owl:DatatypeProperty
  filter( contains( str(?p), "Date" ) || contains( str(?p), "date" ))
}
limit 100

SPARQL results

Now, that will return any property whose string form contains the strings “Date” or “date”. You'll find that most of those are the kind of things you're looking for. However, a better way to do this might be to search for things that have xsd:date as their range, using a query like this:

select ?p where {
  ?p a owl:DatatypeProperty ;
     rdfs:range xsd:date .
}
limit 100

SPARQL results

This has the advantage that you'll get properties whose values should be dates, even if their name doesn't include date. For instance, you'll get:

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Thanks, can you add the English label of the property to the query result? – Amir Pournasserian Sep 04 '13 at 05:31
  • @AmirPournasserian All you need to do to get the English rdfs:label is add `rdfs:label ?label ;` (e.g., between `a owl:DatatypeProperty ;` and `rdfs:range xsd:date .`). The label isn't essential to answering the question as asked, so I'd rather keep the code in the answer minimal and direct. – Joshua Taylor Sep 04 '13 at 13:11
  • I extracted data of "Raw Infobox Properties" from DBPedia. All the predicates are "property" (not "ontology". How can I join them? (I know I'm asking too much in comments but it could be a small hint) – Amir Pournasserian Sep 16 '13 at 17:02
  • @AmirPournasserian I'm sorry, I'm not really clear what you're asking. A new question would probably be best, so that you can write it out in full. DBpedia does have two datasets, though: one is the raw data from the infoboxes, and is less clean; the other is the DBpedia ontology, which is more structured, consistent, and uses a different namespace. The [**Infobox Data** section](http://wiki.dbpedia.org/Datasets) from the documentation has more details. – Joshua Taylor Sep 16 '13 at 17:26