-3

i was trying to import freebase rdf to google refine but getting an error....but now how to extract topic names with notable type from 18 gb rdf to csv etc....any gui tool ?

pnuts
  • 58,317
  • 11
  • 87
  • 139
user2216267
  • 491
  • 3
  • 8
  • 21
  • What error are you getting? Why does it have to be a GUI tool? If all you want is notable type & name, I'd have thought a simple one line grep command would do it for you. – Tom Morris Jul 05 '13 at 03:08
  • it is not importing in Google refine (*.gz size: 18 GB & uncompressed size: 146 GB)....but what & where to type the command..im not a linux user.... – user2216267 Jul 05 '13 at 08:27
  • one line grep command ? – user2216267 Jul 05 '13 at 14:01

1 Answers1

2

146 GB is too big for OpenRefine (ex-Google Refine) to handle. If there is a GUI tool that will do this out of the box, I'm not familiar with it, but since this is a programming Q & A site, I'll give a shell programming solution. You don't need to know anything about Linux, but you do need to know how to use Unix shell commands (you could use Cygwin on Windows).

 curl -L http://download.freebaseapps.com | gunzip | egrep 'notable_for|notable_type|rdfs:label'

will give you all the raw data that you need to assemble the solution. The lines with the key information look like this, but if you just want labels/names, you'll need to substitute them for the subject/object IDs in the first and last colum.

ns:m.01nsxs2    ns:common.topic.notable_types   ns:m.0kpv17.
Tom Morris
  • 10,490
  • 32
  • 53
  • i ran the command provided by you..But how to get clear text with topic name & notable type eg:(Gmail: Software) in csv ?..Currently it is giving: `ns:g.1254yxnny ns:common.notable_for.display_name "Zeneszám"@hu. ns:g.1254yxnny ns:common.notable_for.display_name "Utwór muzyczny"@pl. ns:g.1254yxnny ns:common.notable_for.display_name "Nummer (muziek)"@nl. ns:g.1254yxnny ns:common.notable_for.display_name "संगीत ट्रैक"@hi.` – user2216267 Jul 05 '13 at 16:17