3

I am dealing with name disambiguation issues. I'm wondering if there's a way to find all common usages of a word by using the web to 'crowdsource' those related names.

For instance, in my data, I have the term 'UC Berkeley'. Can I use a google search (or some other type of application) to find all common synonyms of 'UC Berkeley', such as 'University of California Berkeley', 'Berkeley', 'UCB', etc?

I can imagine this might not fit directly into a stack-overflow type question. I'm more than willing to repost in a different location or forum. Please just tell me where.

user3314418
  • 2,903
  • 9
  • 33
  • 55
  • 1
    I think DBpedia might be useful here – Pierre Jun 19 '14 at 22:29
  • For those who are clicking -1, can you let me know what other forums there are to ask these types of questions? Once I get an answer, I'll delete my question. – user3314418 Jun 19 '14 at 23:03
  • 1
    Please explain the downvote. I don't understand why people shouldn't ask that kind of open question. We can have a constructive discussion here. – Pierre Jun 19 '14 at 23:12
  • 1
    @user3314418 I've found it hard to find the right place for those questions as well. Try linkedin groups - there are plenty for text analytics/nlp discussions. On stackoverflow - avoid tags like "python" :) – Yasen Jun 20 '14 at 09:10

1 Answers1

1

You can use FreeBase. For example the 'Univerisity of California, Berkeley' page : https://www.freebase.com/m/02zd460

has a field : /common/topic/alias

in which it lists different common names for this university, although some of them might be noisy:

UC Berkeley
Cal
Università della California (Berkeley) it
Universiteit van Californië - Berkeley nl
Universitato de Kalifornio, Berkeley eo
Berkeley
University of California, Berkeley Campus
University of California, Berkeley main campus
Berkeley Üniversitesi tr
California tr
加州大學柏克萊分校 zh-CN
Університет Каліфорнії uk
加州大学伯克利分校 zh-CN
Калифорнийски университет, Бъркли bg
University of California, Berkeley pl
Universiteit van Californië - Berkeley nl
Universitat de Califòrnia a Berkeley ca
Πανεπιστήμιο της Καλιφόρνιας, Μπέρκλεϋ el
加州大學柏克萊分校 zh-TW
Daniel
  • 5,839
  • 9
  • 46
  • 85
  • This is really great Daniel! do you happen to have a snippet of sample code in python to access freebase's information? It seems like they have an API? – user3314418 Jun 21 '14 at 15:49
  • 1
    You can use Google APIs to query from Freebase. There many ways for this. Watch this first: https://www.youtube.com/watch?v=m6EdVYt9rgs – Daniel Jun 23 '14 at 22:33