Short Answer
To handle all the cases you have outlined, try the following twist on your changes to the (self.cre
) pattern:
import\s+(?:[a-zA-Z0-9_.]+)\s*(?:(?:\s+:\s+[a-zA-Z0-9_.]+\s*)?(?:,\s*(?:[a-zA-Z0-9_.]+)(?:\s*:\s*[a-zA-Z0-9_.]+)??\s*)*)*;

Debuggex Demo
Digging Deeper
self.cre vs. self.cre2
Yes, the find_include_names
method...
def find_include_names(self, node):
includes = []
for i in self.cre.findall(node.get_text_contents()):
includes = includes + self.cre2.findall(i)
return includes
...confirms the relationship between self.cre
and self.cre2
that you guessed: the former matches entire import statements, and the latter matches (and captures) modules therein. (Note the middle (
...)
capture group in self.cre2
vs. (?:
...)
non-capture groups elsewhere throughout self.cre
and self.cre2
.)
self.cre
Picking up where your Python snippet left off...
import re
import1 = "import first;"
import2 = "import first : f;"
import3 = "import first : f, second : g;"
p = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:,\s*(?:[a-zA-Z0-9_.]+)\s*)*;'
pm1 = re.match(p, import1) # match
if pm1 != None:
print "p w/ import1 => " + pm1.group(0)
pm2 = re.match(p, import2) # no match
if pm2 != None:
print "p w/ import2 => " + pm2.group(0)
p2 = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:,\s*(?:[a-zA-Z0-9_.]+)(?:\s*:\s*[a-zA-Z0-9_.]+)??\s*)*;'
p2m1 = re.match(p2, import1) # match
if p2m1 != None:
print "p2 w/ import1 => " + p2m1.group(0)
p2m2 = re.match(p2, import2) # no match but should match
if p2m2 != None:
print "p2 w/ import2 => " + p2m2.group(0)
p2m3 = re.match(p2, import3) # no match but should match
if p2m3 != None:
print "p2 w/ import3 => " + p2m3.group(0)
..., we get the following expected output for p
and p2
attempts to match the import statements:
p w/ import1 => import first;
p2 w/ import1 => import first;
Now consider p2prime
, wherein I have made changes to arrive at the pattern I suggested above:
import re
import1 = "import first;"
import2 = "import first : f;"
import3 = "import first : f, second : g;"
import4 = "import first, second, third;"
p2prime = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:(?:\s+:\s+[a-zA-Z0-9_.]+\s*)?(?:,\s*(?:[a-zA-Z0-9_.]+)(?:\s*:\s*[a-zA-Z0-9_.]+)??\s*)*)*;'
p2pm1 = re.match(p2prime, import1) # match
if p2pm1 != None:
print "p2prime w/ import1 => " + p2pm1.group(0)
p2pm2 = re.match(p2prime, import2) # now a match
if p2pm2 != None:
print "p2prime w/ import2 => " + p2pm2.group(0)
p2pm3 = re.match(p2prime, import3) # now a match
if p2pm3 != None:
print "p2prime w/ import3 => " + p2pm3.group(0)
p2pm4 = re.match(p2prime, import4) # now a match
if p2pm4 != None:
print "p2prime w/ import4 => " + p2pm4.group(0)
With the updated pattern (p2prime
) we get the following desired output for its attempts to match the import statements:
p2prime w/ import1 => import first;
p2prime w/ import2 => import first : f;
p2prime w/ import3 => import first : f, second : g;
p2prime w/ import4 => import first, second, third;
This is a pretty lengthy and involved pattern: so I would not be surprised to find opportunities to fine tune it further; but it does what you want and should provide a solid basis for fine tuning.
self.cre2
For self.cre2
, similarly try the following pattern:
(?:import\s)?\s*(?:([a-zA-Z0-9_.]+)(?:\s+:\s+[a-zA-Z0-9_.]+\s*)?)\s*(?:,|;)

Debuggex Demo
Keep in mind, however, that the since D's <module> : <symbol>
selective imports are just that – selective, capturing the module names in selective imports may not be what you ultimately need (e.g. vs. capturing the module and selected symbol names). As I similarly explained regarding the self.cre
regexp I suggested, further fine tuning where warranted should not be difficult.