I have a list of strings, which I want to classify into groups. I then want to show on string from each groups.
Say my list looks like this:
- The quick brown fox jumps over the lazy dog
- The quick brown fox jumps over the lazy dog!!!!
- The brown fox jumps over the lazy dog
- Zing, dwarf jocks vex lymph
- dwarf jocks vex lymph123
- I love cookies
Then I want to show something like this (one string from each class):
- The quick brown fox jumps over the lazy dog
- dwarf jocks vex lymph123
- I love cookies
I know trigrams are a very easy and useful solution for classifying strings into "strings which are similar" and "strings which are different". I'm also pretty sure they can be used for dividing a list of strings into classes, but I'm not sure how.
Can anyone here help me, or should I use something completely different?
I would much prefer a method which is simple and maintainable over high accuracy.