I'm looking to get all infobox properties and values from Wikipedia pages (only ones that contain infobox). Parsing the infobox using a simple WebRequest will give me too much junk. Therefore, I considering using dbpedia and getting the data using Jena (SPARQL). How can I do that? is there a simple query that will give me all properties as key value pairs? Or RDF and then convert it to what I need.
Asked
Active
Viewed 1,033 times
2 Answers
0
There is a public SPARQL endpoint to dbpedia at http://dbpedia.org/sparql which you can use to experiment. There are examples and various other tools for building queries, described at http://wiki.dbpedia.org/OnlineAccess. You can also download datasets to try queries locally.
Just to clarify, RDF is the data format that dbpedia is published in. SPARQL is the query language for querying RDF. Jena is a specific implementation that inclues RDF datastore(s) and a SPARQL engine.

DNA
- 42,007
- 12
- 107
- 146
-
Would the downvoter care to comment? Happy to improve the answer if you have found a problem with it. – DNA Feb 18 '14 at 16:36
0
Try to use dumps from http://wiki.dbpedia.org/Downloads37. For example "Raw Infobox Properties".

selitsky
- 44
- 5