1

According to Freebase, they have 23,407,174 topics. What is the easiest way to get the UI friendly names (essentially the 'text' attribute of the topic JSON, example of a single topic JSON is here) of ALL of these TOPICs? I don't need any other meta information.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Arman
  • 1,074
  • 3
  • 20
  • 40

2 Answers2

1
wget -O - http://download.freebase.com/datadumps/latest/freebase-simple-topic-dump.tsv.bz2 | bunzip2 | cut -f 2 > freebase-topic-names.txt

although you probably want the Freebase IDs as well so that you know what the names refer to:

wget -O - http://download.freebase.com/datadumps/latest/freebase-simple-topic-dump.tsv.bz2 | bunzip2 | cut -f 1,2

Two additional bits of postprocessing are needed:

  1. Tabs are escaped as \t
  2. The string \N represents a null (non-existent) name
Tom Morris
  • 10,490
  • 32
  • 53
0

Take a look at the Simple Topic Dump that we provide. It's over a GB of compressed data but its still faster to download than trying to get all the names through the API.

Shawn Simister
  • 4,613
  • 1
  • 26
  • 31