1

Say I have the following MQL query:

[{
  "id": null,
  "name": null,
  "type": "/base/givennames/given_name",
  "sort": "name",
  "limit": 100
}]

I get a list of the first 100 names sorted alphabetically:

[
  {
    "id": "/wikipedia/fr/Ay$015Fe",
    "name": "A'isha",
    "type": "/base/givennames/given_name"
  },
  {
    "id": "/en/aadu",
    "name": "Aadu",
    "type": "/base/givennames/given_name"
  },
  {
    "id": "/m/0g9wn3v",
    "name": "Aage",
    "type": "/base/givennames/given_name"
  },
  {
    "id": "/en/aakarshan",
    "name": "Aakarshan",
    "type": "/base/givennames/given_name"
  },
  ...
]

Is there a way to get the 100 most relevant / common / important names instead?

I want to do this for a number of queries, not just given names - so I am not exactly sure how to define the relevancy metric. Perhaps by sorting by the number of inbound links to the id with a subquery?

The search API returns a score element, but I believe it's a relevancy metric related to the search query term (null in this case). I just started with MQL yesterday and I have no idea if this is possible.

Günter Zöchbauer
  • 623,577
  • 216
  • 2,003
  • 1,567
Alix Axel
  • 151,645
  • 95
  • 393
  • 500

1 Answers1

2

You can do this with the Search API like this:

https://www.googleapis.com/freebase/v1/search?filter=(all+type:/base/givennames/given_name)&limit=100

This will give you a list of 100 given names. We don't give out the exact details of how they're ordered but the number of links is definitely a factor.

Shawn Simister
  • 4,613
  • 1
  • 26
  • 31
  • Thanks Shawn, that's what I thought. It seems that the most relevant names are scored in a historical context and not in a contemporary one. Either way, it's not possible to do the same with the MQLread API? 100 results may be too little for some types. – Alix Axel Oct 22 '13 at 08:56
  • The results are scored in the context of all data available in Freebase which I guess you could describe as "historical". In order to rank them in a "contemporary" context you'll need to figure out what that means for your application. For example, the US Census Bureau provides statistics on name frequencies (http://www.census.gov/genealogy/www/data/1990surnames/names_files.html) from the 1990 census. This would give you a reasonable distribution for American adult names right now but would't take into account children born since 1990. – Shawn Simister Oct 23 '13 at 17:12
  • I'm still experimenting with Freebase and I don't know what it is (or isn't) capable to produce. A little background: I wanted to compile (and preferably, keep updated) a couple of lists of common / relevant things (for our contemporary, day-to-day context). These lists would include ontologies such as (among maybe a dozen or so more): names (family and given), countries, cities, animals, fruits, etc. I know I could compile and curate these manually using Freebase and other sources, but I was hoping I'd go about doing most of the work without worrying about maintaining it myself. – Alix Axel Oct 24 '13 at 01:10
  • For instance, this [MQL query for `/location/country`](https://www.googleapis.com/freebase/v1/mqlread/?lang=%2Flang%2Fen&query=%5B%7B+%22type%22%3A+%22%2Flocation%2Fcountry%22%2C+%22name%22%3A+null%2C+%22sort%22%3A+%22name%22+%7D%5D) returns results that I wasn't expecting. I understand that *Abbasid Caliphate* was a country with historical significance and deserves to be categorized as such, but I can't seem to find a way to return only *current* countries - I would expect some sort of `established_on` / `recognized_on` fields to be present, but unfortunately no. – Alix Axel Oct 24 '13 at 01:12
  • Do you think I should curate these lists myself, by hand? Or is there any "hidden" way to do it with Freebase (or some other semantic knowledge data-source for that matter)? – Alix Axel Oct 24 '13 at 01:12
  • Thanks for the link on the name census BTW, that certainly is helpful! – Alix Axel Oct 24 '13 at 01:13
  • Many people use Freebase as a gazetteer of names for people, places and things that can be used for natural language programming. The challenge in your case is that you only want a subset of these and you want them ranked according to your own criteria. This is certainly something that's possible but it will involve more effort on your part. To get a list of current countries you'll need to consider data from the Dated Location type: https://www.freebase.com/location/dated_location – Shawn Simister Oct 24 '13 at 01:23
  • Its pretty easy to grep through the Freebase RDF dumps and get all these lists of names by type of entity. Then you need to filter that data against some other dataset like the Census data in order to rank it or remove historical entities. Another good dataset for ranking entities is the Wikipedia page view data (http://dumps.wikimedia.org/other/pagecounts-raw/) Many of the top entities in Freebase are linked to Wikipedia so you can do a join against this dataset to measure how popular they are at a given time. – Shawn Simister Oct 24 '13 at 01:31