0

I have made a clumsy first attempt at fuzzy pattern matching using the re module in python 2.7.

Unfortunately every attempt I make returns an empty list. I simply don't understand the syntax required. I was wondering if someone might tell me why the following code:

import re
m = re.findall('(ATCT){e<=1}', 'ATCGATCGGCATGCAGTGCAGAAGTGACGAT')
print m

returns an empty list?

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
poppyseeds
  • 421
  • 1
  • 7
  • 14
  • What's ur expected output? – Avinash Raj Feb 17 '16 at 11:28
  • Are you only interested in the explanation? Not in a solution? Your regex is a mess. It matches `ATCT` followed with `{e<=1}` - these are literal character sequences. See [what it matches](https://regex101.com/r/dR7sK9/1). – Wiktor Stribiżew Feb 17 '16 at 11:28
  • @AvinashRaj the pattern should match to several places in the string with one match, the output should be a list of those patterns. – poppyseeds Feb 17 '16 at 11:29
  • @WiktorStribiżew I am interested in both. – poppyseeds Feb 17 '16 at 11:30
  • If you are interested in a solution, please explain - illustrate - what you need to obtain. – Wiktor Stribiżew Feb 17 '16 at 11:31
  • From my understanding, the code above should return a list ['ATCG', 'ATCG']. Clearly my understanding is incorrect so I am interested in what is wrong with the code as is, and in obtaining resources where I might read how to formulate this type of regex search. – poppyseeds Feb 17 '16 at 11:37
  • @poppyseeds: I wonder why you expect `ATCG` match when you use `ATCT` in the pattern? See https://regex101.com/r/dR7sK9/2 – Wiktor Stribiżew Feb 17 '16 at 11:46
  • @WiktorStribiżew I expect this as I am attempting to implement fuzzy matching, as opposed to exact matching. The code should allow one error as specified by the {e<=1} which should not be read literally, as detailed here : https://pypi.python.org/pypi/regex/2014.10.09 – poppyseeds Feb 17 '16 at 11:48
  • Yes, `regex`, not `re`. Then, why did you `import re` and not `regex`? See `>>> import regex >>> m = regex.findall('(ATCT){e<=1}', 'ATCGATCGGCATGCAGTGCAGAAGTGACGAT') >>> print(m) ['ATCG', 'ATCG'] >>> ` It works as you expect! I guess the question should be closed due to a typo. – Wiktor Stribiżew Feb 17 '16 at 11:55

1 Answers1

2

Since you intended to use the PyPi regex module, you need to use

>>> import regex
>>> m = regex.findall('(ATCT){e<=1}', 'ATCGATCGGCATGCAGTGCAGAAGTGACGAT')
>>> print(m)
['ATCG', 'ATCG']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563