3

I used the tool AIDA (a kind of named entity tool) to annotate a corpus and get the format like this:

2   Germany http://en.wikipedia.org/wiki/Germany    11867   /m/0345h
6   United_Kingdom  http://en.wikipedia.org/wiki/United_Kingdom 31717   /m/07ssc

the column 3 is the corresponding Wikipedia URL of the entity and the column 4 is the corresponding Wikipedia ID of the entity. Is there a way to map the url or the id to the Freebase MID like the last column? The last column was the other person's work. I have no clue how he did it and can't find a way in the other place.

Here is the AIDA link: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/

Tom Morris
  • 10,490
  • 32
  • 53
hidemyname
  • 3,791
  • 7
  • 27
  • 41
  • [Freebase.com was officially shut down on 2 May 2016](https://groups.google.com/forum/#!topic/freebase-discuss/WEnyO8f7xOQ). – Pablo Bianchi Dec 10 '17 at 05:20

2 Answers2

2

It's easy to map from both of those EN Wikipedia IDs to a Freebase topic and it's various identifiers, including the MID, using either the Freebase API or the Freebase data dumps. Which one is best to use will depend on the volume of data that you need to map.

All Wikipedia IDs are stored in the namespace rooted at /authority/wikipedia in Freebase. The numerical IDs (ie article numbers) are stored in /authority/wikipedia/en_id for the English Wikipedia, so you can use http://freebase.com/authority/wikipedia/en_id/11867 as one of the aliases for the Germany topic.

All the other sub-namespaces are listed here: https://www.freebase.com/authority/wikipedia?ns= but the other two that are relevant for English Wikipedia are en and en_title, both of which contain keys using the alpha Wikipedia article names. The latter is the canonical ID and is unique while the former contains that ID, plus the IDs for all the redirect pages that point to it.

Both of these URLs are also aliases for Germany:

https://www.freebase.com/authority/wikipedia/en/Germany https://www.freebase.com/authority/wikipedia/en_title/Germany

To use the MQLRead query API, construct a query like this:

[{
  "id": "/authority/wikipedia/en_id/11867",
  "mid": null,
  "name": null
}]

and parse the resulting JSON

{
  "result": [{
    "id": "/authority/wikipedia/en_id/11867",
    "mid": "/m/0345h",
    "name": "Germany"
  }]
}

to get the MID. The full query URL would look like this:

https://www.googleapis.com/freebase/v1/mqlread/?lang=%2Flang%2Fen&query=%5B%7B+%22id%22%3A+%22%2Fauthority%2Fwikipedia%2Fen_id%2F11867%22%2C+%22mid%22%3A+null%2C+%22name%22%3A+null+%7D%5D

You could do the same thing with the alpha keys in the other namespaces, but the keys need to be escaped for special characters and it's not worth the hassle to describe it since you've got the numeric identifiers. MQL Key Escaping is described here if anyone else needs it: http://wiki.freebase.com/wiki/MQL_key_escaping

Tom Morris
  • 10,490
  • 32
  • 53
  • Hi. May I ask you about another question? Since I need to query for like 26G data and there is limitation for query of freebase and also freebase is shutting down,I am thinking about downloading all the data on freebase and query offline. Do you know how to download it? Thank you!! – hidemyname Jun 25 '15 at 06:25
  • Sure, downloading the database is easy and instructions are the first hit for every search query that I can think of. What did you search for that failed to find the Freebase data dump? – Tom Morris Jun 25 '15 at 14:45
  • Hi Thanks. I found that page. But at that time I know nothing about RDF. Now I am learning materials about RDF and how to query from the dump data. Thanks! – hidemyname Jun 26 '15 at 06:40
0

You could query Freebase with the Wikipedia info, see the Freebase API docs. Query on the /common/topic/topic_equivalent_webpage property. However, Freebase will be shutting now in the near future so I don't recommend putting much effort into that.

akb
  • 61
  • 4