1

I have two sets of protein sequence data. As you can see, these 2 sequences look the same but actually they have 1 different amino acid (letter) between them.

For example:

File 1:

TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSISEVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K

File 2:

TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSISVVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K

Desired output:

File 1:

E

File 2:

V

I'm aware that we can print different pattern from two sets of data using command of grep, comm, diff; the search is based on line. But in this situation, how do I print the letter different between these two patterns? Thanks.

Karthick Balaji
  • 64
  • 2
  • 14

3 Answers3

2

I don't think you need re module here. Just a loop can fix your code.

file1='TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSISEVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K'
file2='TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSISVVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K'
for i in range(len(file1)):
    if(file1[i]!=file2[i]):
        print(file1[i]),(file2[i])

Your output is:E V

Here, we compare the files letter by letter.

Karthick Balaji
  • 64
  • 2
  • 14
0

For loop:

test_two_strings <- function(string1 = file1, string2 = file2){
                          for(i in 1:nchar(file1)){
                            if (substr(file1, i, i) != substr(file2,i, i)){
                              cat(paste("File 1:", substr(file1, i, i) ,"File 2:", substr(file2, i, i),sep = "\n"))
                              break()
                                       }
                                  }
                             }

microbenchmark(test_two_strings(), times = 1000)

VUnit: microseconds
               expr     min      lq     mean  median      uq      max neval
 test_two_strings() 133.927 144.199 169.5508 148.544 160.791 2132.148  1000
InfiniteFlash
  • 1,038
  • 1
  • 10
  • 22
-1

You can try this also, I'm comparing by the each two string, if the condition getting failed I'm checking in between string.

str1 ="TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSESsVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K"
str2 ="TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSVSsVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K"

for i in range(len(str1)/2):
    if(str1[i:i+2] !=  str2[i:i+2]):
         if  (str1[i:i+1] !=  str2[i:i+1]):
            str1[i:i+1]+"\n"+str1[i+1:i+2]
         else:
            print str1[i+1:i+2]+"\n"+str2[i+1:i+2]
mkHun
  • 5,891
  • 8
  • 38
  • 85