Freebase rdf dump parsing for Name-Type exctraction..?

Question

I have parsed freebase data dump and now have RDF like the following:

<http://rdf.freebase.com/ns/m.0mspb64> <http://rdf.freebase.com/ns/type.object.type> <http://rdf.freebase.com/ns/music.release_track>
<http://rdf.freebase.com/ns/m.0mspb64> <http://rdf.freebase.com/ns/type.object.name> "Mit Rees und Hans im Bürgli"@de
<http://rdf.freebase.com/ns/m.0mspd6m> <http://rdf.freebase.com/ns/type.object.type> <http://rdf.freebase.com/ns/music.release_track>
<http://rdf.freebase.com/ns/m.0mspd6m> <http://rdf.freebase.com/ns/type.object.name> "Granny Scratch Scratch"@en

Having this rdf dataset, how can I extract the name and type of a particular resource? For instance, from the data above, I want to extract:

Mit Rees und Hans im Bürgli ### music.release_track
Granny Scratch Scratch ### music.release_track

score 2 · Accepted Answer · answered Feb 27 '14 at 23:24

What did you use to parse it? The format that you're showing is the raw data format.

If you've loaded it into an RDF store, you should be able to easily query to get the information you need using SPARQL or whatever other query interface the store offers.

If you're just working with raw text file, you should be able to take advantage of the fact that it's sorted by subject ID (you should verify that this is still true) to process it as a stream without requiring lots of working storage (ie RAM).

The only temporary storage that you need is 1) the current subject ID, 2) the name of the current subject and 3) the type of the current subject. If the type isn't the one you want (release_track), you can just skip to the next group of subject triples. If it is the right type, you can output a line for your triple as soon as you have both the name and the type.

Thanks Tom , for your reply.. I parsed entire dump using grep command, since am interested in only Name-Type, i exctracted only desired data. I don't think i need rdf, as you said i can have temporary storage and parse the data.. I am planning to do it in Java.. Do have any suggestion on choice of programming language to do it..?? — Sreedhar GS, Feb 28 '14 at 08:37
Hopefully that was actually `zgrep` so you didn't have to deal with storing the decompressed data. If you used an OR (|) pattern which preserved the original subject grouping, you should have the two pieces of data on adjacent lines. For a quick program like this, I'd probably use Python, but Java will work fine too. — Tom Morris, Feb 28 '14 at 14:59

Freebase rdf dump parsing for Name-Type exctraction..?

1 Answers1