I am trying to make a way of presenting human-input words in a way that makes their groupings more easily recognisable as referring to the same thing. Essentially a spellchecker. I have gotten as far as making a large matrix (the actual one is 250 * 250 ish). The code for this matrix is identical to the reproducible example given below. (I have populated this with a random word generator, the actual values make much more sense but are confidential)
strings <- c("domineering","curl","axiomatic","root","gratis","secretary","lopsided","cumbersome","oval","mighty","thaw","troubled","furniture","round","soak","callous","melted","wealthy","sweltering","verdant","fence","eyes","ugliest","card","quickest","harm","brake","alarm","report","glue","eyes","hollow","quince","pack","twig","knot")
matrix <- stringdistmatrix(strings, strings, useNames = TRUE)
Now I want to create a new table with two variables, the first column must contain pairs of elements of 'strings' that satisfy the condition that their string-distance was lower than some number lets say for this example (stringdist<7, nonzero), the second column must contain the stringdist. Also the table should not show the reflection of the results present in the matrix e.g. (oval, curl: 3), (curl, oval: 3).
I've got a feeling that this will require an apply
function of some sort but I haven't a clue.
Cheers.