I am trying to build a similar product using LSH and I have following query.
My data has following schema
id: long,
title: string,
description: string,
category: string,
price: double,
inventory_count: int,
active: boolean,
date_added: datetime
Should I perform LSH on individual features separately and then combine them in some way, may be weighted average?
or
Should I go about building LSH on all features all together (basically attaching feature name while creating shingles like title_iphone, title_nexus, price_1200.25, active_1...) and then using bag-of-words approach perform LSH on this bag?
If someone can direct me to a document where I can figure out how to perform LSH on structured data like of ecommerce it will be great.
P.S. I'm planning to use spark and min-hash function in LSH. Let me know if you need any more details.