I have an “entity resolution” type of use case, where I have several (< 100) device features available for many (a few millions of) devices. My goal is to generate ids for these devices. The challenge is that the same device might have two or more slightly different representations, but I still want to assign the same device id to all of them.
I want your recommendation in this regard:
- What kind of feature pre-processing should I apply?
- Which algorithms will be best for my purpose?
- Please do mention if there are standard implementations of such algorithms.
Thanks and regards,