Today I came across a certain task and enjoyed solving it with a clean code, so decided it'd be cool to share it with the rest of the class - but hey, lets keep it in the format of a question.
The task:
Given an instance of type T
(source) and a collection of instances of type T
(possible suggestions),
Provide suggestions that are similar to the source, ordered by similarity, and entirely excluding suggestions which their similarity is below a certain threshold.
Similarity will be fuzzy-string comparison of multiple fields of the instance, each field with an importance weight.
Example input:
Source instance:
{A = "Hello", B = "World", C = "and welcome!"}
Possible suggestions:
{A = "Hola", B = "World", C = "Welcome!"}
{A = "Bye", B = "world", C = "and fairwell"}
{A = "Hell", B = "World", C = "arrives..."}
{A = "Hello", B = "Earth", C = "and welcome!"}
{A = "Hi", B = "world", C = "welcome!"}
Importance of fields:
- A: 30%
- B: 50%
- C: 20%
Example output:
[0] = {A = "Hell", B = "World", C = "arrives..."}
[1] = {A = "Hola", B = "World", C = "Welcome!"}
[2] = {A = "Hello", B = "Earth", C = "and welcome!"}
[3] = {A = "Hi", B = "world", C = "welcome!"}
Note that the possible suggestion Bye;world;and fairwell
is not here at all, as it doesn't meet the minimum similarity threshold (lets say the threshold is at least 50%
weighted-similarity)
The first result is the most similar to the source, even though the C
field is not similar at all to the source, because we gave C
a weight as low as 20%
, and the other two more-heavy-weighted fields are very similar (or an exact match) to the source.
Fuzzy comparison side-note
The algorithm to be used for comparing string a
and string b
can be any of the known fuzzy comparison algorithms, that's not really the point here.
So how could one turn that list of possible suggestions into an actual list of ordered suggestions? (Oh lord, please help, etc)