1

More than a few times I've wanted to programmatically pick the better of two words or phrase using frequency of use on the Internet as a heuristic.

The obvious way, and the way to do it manually, is to enter each term into a search engine and note how many "hits".

But the big search engines have deprecated their search APIs or limit to 100 queries per day free of charge even with an API key. Not great if you're working on a free project. Also the big search engines have a "no scraping" clause in their terms of service.

I need it to work for arbitrary, perhaps even unidentified languages, and from a device with limited storage. This rules out having a local corpus or database.


One area of application is tools for Wiktionary editors, helping them choose the main spelling of several variants even if the don't know the language. The one I have in mind right now is using frequency as a heuristic to help choose the best conversion between a spelling in a foreign script and a lossy transliteration in the Latin alphabet.

hippietrail
  • 15,848
  • 18
  • 99
  • 158

0 Answers0