1

Say for example a person writes as a query - "d dark knight rses". I want to find the nearest wikipedia page that is - http://en.wikipedia.org/wiki/The_Dark_Knight_Rises

What are possible ways to do that?

One simple way that I could think of is that search the given query on google appended with the term wikipedia. Then in the results look for the first wikipedia page. If there is no wikipedia page even in top 5 pages, return Sorry.

But is there any other convenient method or API call which avoids using Google.

Edit : CLOSEST - For example "d dark night" might result in "The Dark Night" or "The Dark Knight". Both of these are valid answers. Even though the former is closer to the query, but I guess the later is a better answer because that is what user query is likely to be.

w2lame
  • 2,774
  • 6
  • 35
  • 48
  • Hi there, about the google search you can "force" google to search only in wikipedia by writing `site:wikipedia.org` and then the string you want to search but that way you can get some false positives. – TheNewOne Aug 18 '12 at 20:12
  • 1
    You can always use Bing Developer or Yahoo Boss API – amit Aug 18 '12 at 20:14
  • Using a Bing Developer API is good. I heard that they are moving to azure though. Let me check. – w2lame Aug 18 '12 at 20:42
  • You need to be more specific, Do want a user to enter a query in your search box > search Wikipedia or Google for the top 5 matches > and if matched return the results. ? – user1608656 Aug 18 '12 at 20:11
  • I didn't understand your question. Let us assume that we somehow have this query, and I want to find the closest wikipedia page to it. – w2lame Aug 18 '12 at 20:43
  • 1
    @w2lame: closest according to what distance? – carlosdc Aug 19 '12 at 06:04
  • Added this in the question details. – w2lame Aug 20 '12 at 18:20

1 Answers1

3

Maybe you can use the official Wikipedia API, here an example of opensearch call with dark night query:

$ curl "https://en.wikipedia.org/w/api.php?action=opensearch&search=dark%20night"

This returns:

[
    "dark night", 
    [
        "Dark Night", 
        "Dark Night of the Soul", 
        "Dark Night of the Soul (album)", 
        "Dark Night of the Scarecrow", 
        "Dark Night (song)", 
        "Dark Night (film)", 
        "Dark night rises", 
        "Dark night (roller coaster)", 
        "Dark night sky paradox"
    ]
]

UPDATE: also another approach is to download Wikipedia data dump and do some searching locally.

Tomasz Nurkiewicz
  • 334,321
  • 69
  • 703
  • 674