0

I just made a conversion search on google for "15lbs in kg" and first hit is http://www.trueknowledge.com/q/what_is_15_kg_in_lbs

I can then change 15 to ANY number, including decimals, and I always get trueknoledge as first hit with a direct link to their site for converting that number.

I can imagine that you can build up something like this fairly easy by automatically linking to the next number on every page and they also seem to do this by providing "questions like yours"-links. For this example it's quite easy but I've seen many other cases where you search for something arbitrary only to hit another search page that provides their own crappy search results for that exact search-phrase.

Is this just based on generating links by guessing phrases to provide for googles crawler or how is it done?

I'm not interested in creating a clone of these sites, I truly hate them. I'm just curious on how it's made and if google is trying to prevent it in some way. For the conversion where they provide a good result I don't mind, but when I get to another search-page it's really annoying.

tshepang
  • 12,111
  • 21
  • 91
  • 136
Rabarber
  • 1
  • 1
  • Not trying to play the smart***, but is it a good idea to give tips about how to build such automated sites? – methode Oct 04 '10 at 12:15

1 Answers1

0

Actually, "I can then change 15 to ANY number" it's not true. E.g. right now if you search for "15lbs in kg" gives http://wiki.answers.com/Q/How_much_is_15_lbs_in_kg as one of the links. However, if you try "15.713lbs in kg", you don't get http://wiki.answers.com/Q/How_much_is_15_713_lbs_in_kg or similar in the list. If you search for "15.71349lbs in kg", you get nothing (except Google converter's output). As you mentioned, it's not that it doesn't understand decimals - http://www.trueknowledge.com/q/15.1_kg_in_lbs is the first link when searching for "15.1lbs in kg".

Disclaimer: I don't know what these sites do and how they do it, this is just my opinion.

These must be generated from user queries somehow. Probably the most generative one is the search bar on http://www.trueknowledge.com/. When users search there, the site can automatically generate links that Google can then find. If you go to some links on the site, such as http://www.trueknowledge.com/recent-activity, you can see that there are a lot of questions on the page, each with a link similar to what you posted. This is one of the ways Google finds them. "15lbs in kg" is probably a very common query, thus it has probably been asked a million times already and is in some of the questions.

Note, also, that there are question pages, such as http://www.trueknowledge.com/new-questions/100. If you crawl from there (and, believe it, Google has fast crawlers :)) you can get 100 questions per page. The last page as of now is http://www.trueknowledge.com/new-questions/94000 - note, that is 94000 links per crawl, which probably happens very frequently for this type of site.

There are many other possible techniques, of course:

  • Some sites give you a free toolbar to install. Each query that you do through that toolbar ends in the hands of that site,
  • Some sites do crawling themselves, in the same way Google does,
  • You can use referrer (see How does a website highlight search terms you used in the search engine?) to get the queries performed by users that land on your site,
  • Pre-generation as you mentioned is definitively used - sites like trueknowledge.com had to have a huge base before they launched, which they probably enhanced by pre-generating the data e.g. by using dictionaries or list of towns in the world or so.

The volume of information today on the Internet is so huge that it's arguably not really hard to generate links like trueknowledge.com does. Hard parts that these guys face are on the other side - searching and getting meaningful results fast.

Community
  • 1
  • 1
icyrock.com
  • 27,952
  • 4
  • 66
  • 85