I have a machine learning problem. I am given a long list of domains and I have to figure out which are ecommerce websites and which are personal websites. It is kind of a difficult problem because I do not have any training data to work with. I have come up with a couple ideas:
Go through a couple hundred of these websites manually to tell if they are business or personal and develop a training set this way (Long and boring!).
Crawl these websites and search for some keywords eg. "Buy Now", "Price", "Credit Card". etc.
Does anybody have any other approaches?
Thanks