I have a dataset of of 500 mobile devices having 10 attributes namely
Date|Company|ModelName|Price|HardDisk|RAM|Colour|Display size|Cam1|Cam2
The sample dataset is given below :
24/10/2015 | walmart | Samsung Galaxy Note 4 N910H 32GB Unlocked GSM OctaCore Cell Phone-N910H 32GB GOLD | 599.99 | 32 | N/A | cell gold | N/A | 10.2 | 16
25/10/2015 | walmart | Samsung Galaxy Note 5 SM-N920i Gold International Model Unlocked GSM Mobile Phone | 717.95 | 32 | N/A | gold | N/A | 5.7 | 16
26/10/2015 | amazon | T-Mobile AllShare Cast Wireless Hub | 65.15 | N/A | N/A | streaming | N/A | N/A | N/A
I have to find the the most similar or unique devices or remove duplicate mobile devices from the dataset by taking into account the various attributes of the mobile devices.
I have explored many similarity algorithms like Jaccard similarity, cosine similarity. Levenshtein Distance but they seem to work upon attributes with same datatype.
Please suggest an algorithm or approach that could work on this type of mixed datatype dataset taking into account almost all attributes.