0

I have a machine learning problem. I am given a long list of domains and I have to figure out which are ecommerce websites and which are personal websites. It is kind of a difficult problem because I do not have any training data to work with. I have come up with a couple ideas:

  1. Go through a couple hundred of these websites manually to tell if they are business or personal and develop a training set this way (Long and boring!).

  2. Crawl these websites and search for some keywords eg. "Buy Now", "Price", "Credit Card". etc.

Does anybody have any other approaches?

Thanks

user1893354
  • 5,778
  • 12
  • 46
  • 83
  • I would think crawling is the right way to go, but I would suggest looking for sites with links such as "Locations", "Contact Us" rather than simply keywords. – Jordan Aug 22 '13 at 20:38

1 Answers1

2

You could adaptively modify your keyword sets: As you crawl around, a word that correlates highly with existing keywords can be added to the list. Peter p.s. I would add this as a comment but I don't have enough reputation points...

Peter
  • 55
  • 3