1

I have a text file like below:

  • INPUT.txt

    155    Phe  12xD,7xQ,5xE,5xG,4xA,4xS,2xF,2xH,2xI,2xK,1xM,1xN
    151    Glu  11xD,6xA,5xE,3xF,3xG,3xM,2xI,2xS,1xH,1xK,1xL,1xP
    159    Thr  15xF,6xL,6xM,5xG,5xI,5xT,4xA,4xV,3xR,1xD,1xN,1xP
    

Here, My aim is: To keep the numbers that comes above >=6 in 3rd colum:

So, What I did is: I tried to replace 1x(Anyleter),2x(Anyleter),3x(Anyleter),4x(Anyleter),5x(Anyleter) by nothing through following script:

filepointer = open(filename,"r") # Reading file
text = filepointer.read()
merged = text.splitlines()
    for i in merged:
        print re.sub("[0-5]x[a-zA-Z]","", i.rstrip())#Replace 1x,2x,3x,4x,5x by nothing

OUTPUT:

155    Phe      2,7xQ # 2xD belong to 12xD replaced
151    Glu      1,6xA # 1xD belong to 11xD replaced
159    Thr      5,6xL,6xM # 5xF belong to 15xF replaced

Replacing on 1x,2x,3x,4x,5x is perfect but when these 1x,2x,3x,4x,5x belong to 11x,12x,13x,14x,15x also getting replaced. So I want to restrict this by replacing just a single digit not for more than single digit.

  • Expected OUTPUT:

     155    Phe      12xD,7xQ
     151    Glu      11xD,6xA
     159    Thr      15xf,6xL,6xM
    

I hope my question is understandable.

I just want to replace the 1 by nothing

not the 1 belong to 11,21,31,41 etc etc

Thanking you in advance

user3805057
  • 195
  • 1
  • 13

2 Answers2

3

You may use

re.sub(r",?\b[0-5]x[a-zA-Z]\b","", s)

See IDEONE demo

The regex - ,?\b[0-5]x[a-zA-Z]\b - features a word boundary \b so that digit + x + letter must be preceded and followed by non-word characters (not [a-zA-Z0-9_]) and a comma is optional at the start (as ? matches 1 or 0 occurrences of the preceding subpattern).

Also, please note that regular expressions are best declared using "raw" string literals (see r"" notation). That way, we do not have to use double backslashes when using the word boundary.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Another way to go with function-

reg.txt contents is as below-

155    Phe  12xD,7xQ,5xE,5xG,4xA,4xS,2xF,2xH,2xI,2xK,1xM,1xN
151    Glu  11xD,6xA,5xE,3xF,3xG,3xM,2xI,2xS,1xH,1xK,1xL,1xP
159    Thr  15xF,6xL,6xM,5xG,5xI,5xT,4xA,4xV,3xR,1xD,1xN,1xP

p = r"C:\reg.txt"

f = open(p,'rb').readlines()
def changer(l):
    d= l.split(',')
    dd = d[1:]
    lst = ['6', '7', '8', '9']
    s = [i for i in dd if i[0]  in lst]
    s.insert(0,d[0])
    return ','.join(s)
for i in f:
    print changer(i)

Prints-

155    Phe  12xD,7xQ
151    Glu  11xD,6xA
159    Thr  15xF,6xL,6xM
Learner
  • 5,192
  • 1
  • 24
  • 36