I have a data frame with a column of class 'character'. I am trying to (a) create a new variable in some way summarizing how similar the value of a row in that column is to the most similar other value in the column and (b) identify the row of the most similar available value in that column for a given value in the column.
My existing approach is to calculate an edit distance measure using the stringdist package (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf) except this seems to be incredibly computationally demanding and after hours of waiting still does not compute, but also it's not clear how to search for the smallest distance for each observation based on finding the distance of a given value from other values in the same vector. Furthermore, it doesn't appear to return the index of the most similar value.
Is there any somewhat computationally tractable way to develop a minimal distance measure for each observation and the comparison row for which the distance is minimized?
# Create data
data.frame(x = c("a","abbb","aa", "abbbkdjsfjldkfjldfkjl"))
# Want something like
data.frame(smallest_distance = c(1,20,1,90), closest_match = c(3,3,1,2))