5

I am trying to find a match of certain length (range 4 to 12) in a DNA sequence

Below is the code:

import re
positions =[]
for i in range(4,12):
    for j in range(len(dna)- i+1):
        positions.append(re.search(dna[j:j+i],comp_dna))

#Remove the 'None' from the search
position_hits = [x for x in positions if x is not None]

I get this:

[<_sre.SRE_Match object; span=(0, 4), match='ATGC'>,.........]

How do I extract the value from span and match? I have tried .group() but it throws out an error

AttributeError: 'list' object has no attribute 'group'
Kian
  • 53
  • 5

1 Answers1

2

If you want to fix the current approach, you may use

position_hits = [x.group() for x in positions if x]

You may get all the matches directly in the for loop:

import re
position_hits = []
for i in range(4,12):
    for j in range(len(dna)-i+1):
        m = re.search(dna[j:j+i],comp_dna)
        position_hits.append(m.group())
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks Wiktor, but I will get the match values but I also need the span values. Is there a way to get the span values as well? Thanks – Kian Jul 05 '18 at 13:57
  • @Kian Perhaps, you are asking about [`position_hits.append(m.span())`](https://docs.python.org/2/library/re.html#re.MatchObject.span). – Wiktor Stribiżew Jul 05 '18 at 13:59