There is some sequence data to be compared. The expected output is the distance matrix which shows how similar each sequence is to the others. Previously, I used ngram.NGram.compare
in Python and now I want to switch to R. I found ngram
and biogram
package but I was unable to find the exact function which generate the expected output.
Assume this is the data
a <- c("ham","bam","comb")
The output should be like this (distance between each item):
# ham bam comb
#ham 0 0.5 0.83
#bam 0.5 0 0.6
#comb 0.83 0.6 0
It is the equivalent Python code for the output:
a = ["ham","bam","comb"]
import ngram
[(1 - ngram.NGram.compare(a[i],a[j],N=1))
for i in range(len(a))
for j in range((i+1),len(a)) ]