0

I am planning to create an affiliate site (Price Comparison site).

As you all know that DATA (products and their Info.) from different sites(Ecomm sites) plays a vital role in these type of price comparison sites.

I have already wrote scripts to scrap data for products from the sites of my interest and its working as expected.

In more detail, I am scrapping following common parameters and storing them in my DB. 1)product Title , 2) Product Description , 3) Price , 4) Pay modes etc. [FYI: I used JSOUP APIs to scrap data]

PROBLEM STARTS HERE:

I want to group products [same product] from different sources which I scrapped from these sites.

To illustrate my questing: Say XYZ is product sold on 5 different sites with some changes in Its PRODUCT TITLE.

I scrapped data from these 5 sites saved it to my DB now how should I effectively group these products to single group. so that I can show 5 different sources on single page of my site.

I do not have any clue that how should I proceed in it.

[String comparison is first thought that comes to my mind but do not think that i'll work in long run.]

Any suggestions / Recommendation are welcomed and appreciated.

I you require any further information please do not hesitate to add comments.

-JS

Jagdeep84
  • 1
  • 1
  • Hi @Jagdeep84 . Have you got success doing this work ? I have also stuck in same situation . Plz tell me if you have done this . – Deepak Rathore Oct 12 '15 at 06:39

1 Answers1

0

At initial phase you can use solr for getting best score while comparison between product title or moreover its descriptions.

More in depth if we think about user side, why a product is consider as common product. these are the features which makes product common. like brand, color , material blah blah....

Make a dictionary of feature set for different catalog which should be same while declaring any product as common product. it may be possible then for a same feature set we have many products to identify, in this case u can take help from solr for scoring...

Moreover You can check google image search api which at the end help to get image similarity scoring. this will be helpful in finding of common products for fashion catalogues

Hope it will help...

  • Thanks Anand, for replying on this thread, I'll Surely try it to figure out upto which extent this'll help in this case. – Jagdeep84 Oct 29 '14 at 02:59