0

How can I compare sequences and get their similarity score out?

e.g.

l1 <- "AAFCARTTAA" l2 <- "AAFCAXTTAA"

Which function can I use which would return 0.9 (so the similarity between sequences)?

Thanks!

I've tried to look through most functions of stringr but can't seem to solve it...

A suggestion to a function that would return the abovementioned.

LasseVoss
  • 23
  • 5
  • 1
    Does this answer your question? [Calculating string similarity as a percentage](https://stackoverflow.com/questions/46446485/calculating-string-similarity-as-a-percentage) – benson23 Apr 14 '23 at 14:56

2 Answers2

0

You can use the stringdist() function from the stringdist library.

Several algorithms make it possible to measure the similarity between two character strings, including the Jaccard distance.

l1 <- "AAFCARTTAA" 
l2 <- "AAFCAXTTAA"

stringdist::stringdist(l1, l2, method = "jaccard", q = 4)
Florent
  • 55
  • 4
  • But, when I run that in RStudio, that returns 0.7272 when I run it, meaning that would be incorrect as it should be 0.9? – LasseVoss Apr 14 '23 at 16:13
0

Does this code help ?

sum(str_split_1(l1, pattern = "") %in% str_split_1(l2, pattern = ""))/length(str_split_1(l1, pattern = ""))

There are many ways to calculate distance betwenn two character strings.

Florent
  • 55
  • 4