Variable expansion in Python regex

Question

I am trying to write a script in Python which "cleans" a number of file-fed text like this:

for i in range(1,10):
    number = 1
    cleanText = re.sub('number.','',line).strip() 
    number = number + 1
    print cleanText

An example file would be: 1. Hello, World 2. Hello earth

What I need to do here is remove the numbering and the dots along with leading blank spaces in one fell swoop. But how on earth can I first perform a simple variable expansion?

Thank you all in advance.

Why are you assigning 1 to `number` and then incrementing it by 1 each iteration? (also why is `number` there at all, it's never used) — Jason Sperske, Mar 18 '13 at 20:05
Couldn't you just use a regular expression that matches any number? — Kevin, Mar 18 '13 at 20:05
I think you looking for the PHP feature where a variable referenced in a string can be replace with it's value on echo. Python does not have this. There is `printf` (which is slightly different), but wouldn't apply here anyways — Jason Sperske, Mar 18 '13 at 20:07
@Kevin Yes, I could simply do it with a regular expression. However in this case I wish to follow the substitution path which btw also works in a number of cases where regexes don't. Jason I don't know. I thought number would somehow be replaced by the value of 1. — stratis, Mar 18 '13 at 20:08
There is this:http://stackoverflow.com/a/4840617/16959 which looks like variable expansion, but you are actually passing arguments to a `format` method that is parsing a string and substituting values as it finds them — Jason Sperske, Mar 18 '13 at 20:09
I 've red about locals() in the past but haven't quite figured out how to properly use it. Maybe an example..? — stratis, Mar 18 '13 at 20:11
@JasonSperske this is about the worst thing one could do to solve this incredibly simple problem. — l4mpi, Mar 18 '13 at 20:12
@l4mpi, I agree, I'm just trying to provide some continuity if Konos5 has an idea of how Strings worked in Python based on other programming languages — Jason Sperske, Mar 18 '13 at 20:13
@JasonSperske To be honest I am just learning the basics and coming from a bash environment it came to me as a surprise how hard it can be to perform a plain replacement. Anyway thanks for your time. I think I get the picture now. — stratis, Mar 18 '13 at 20:19

score 3 · Answer 1 · answered Mar 18 '13 at 20:15

3

If your file format is guaranteed to be like you said:

1. Hello, World
2. Hello earth

You don't even need to use a regex, you could just use split and join:

clean_line = ' '.join(line.split(' ')[1:]).lstrip()

>>> ' '.join("1. Hello, world".split(' ')[1:])
'Hello, world'

Or, if you still wanted to do substitution, this replace-based code may work:

number = 1
for line in file_handle:
  clean_line = line.replace("%d. " % number, "").lstrip()
  number += 1

answered Mar 18 '13 at 20:15

Valdogg21

1,151
4
14
24

or, `for number, line in enumerate(file_handle, 1): ...` – user4815162342 Mar 18 '13 at 20:15
@Valdogg21 Your second solution with ``replace())`` is unpythonically dumb. Your first solution is based on a good idea, but read the doc on ``strip()`` and write: ``line.lstrip().split(None,1)[1]`` for a line, and ``print '\n'.join(line.lstrip().split(None,1)[1] for line in text.splitlines(1))`` for a text – eyquem Mar 18 '13 at 22:39

score 2 · Accepted Answer · answered Mar 18 '13 at 20:11

2

As others said, you should simply use a regular expression that matches any number, such as r"\d" or r"\d+". However, for learning purposes, here is the answer to what you did ask.

The closest useful equivalent of "variable expansion" is the string formatting operator:

cleanText = re.sub('%d.' % number, line).strip()

You could also use str(number) + '.' to achieve the same effect. There are several more problems with your code:

your loop is wrong; if you're iterating over range(1, 10), then you don't need to increment number manually.
you probably meant range(1, 11).
. in regular expression syntax matches any characters; you want \..

A cleaned-up version might look like this:

cleanText = line.strip()
for i in xrange(1, 11):
    cleanText = re.sub(r'%d\.', '' , cleanText)

answered Mar 18 '13 at 20:11

user4815162342

141,790
18
296
355

In the third line, what's the r used for before '% ? – stratis Mar 18 '13 at 20:15
2

@Konos5 It's a so-called raw string, meaning backslashes are treated as literals and are not used for escape sequences (except for escaping the enclosing quote type). See http://docs.python.org/2/reference/lexical_analysis.html#string-literals – l4mpi Mar 18 '13 at 20:17
1

@Konos5 The **r** in front of ``%d\.`` is completely unjustified. Such an **r** deactivates the influence of the backslash present in some escaped sequences OF A STRING, such as ``\n \t \a \b \\ etc``. The sequence ``\.`` is not an escaped sequence at the level of string: ``.`` is a dot and ``\.`` is a backslash and a dot. Point. ``\.`` is an escaped sequence at the level of a regex's pattern. It is more easy to use raw strings (**r** in front of) when they represent a regex's pattern intended to represent a backslash in it. Apart this case , it brings nothing to use raw strings – eyquem Mar 18 '13 at 22:01
1

The idea is that you don't have to *think* about which characters following the backslash are escape sequences. Some escape sequences are obscure or rarely used (`\a`, `\b`, `\v`), some subtly differ between languages (Emacs and GCC grok `\e`, Python doesn't), and some are interpreted differently (shells interpret `\.` as `.`, Python as `\.`). In a raw string, you just type \ knowing that it won't be interpreted as anything except the literal \ character. This is why it is a good idea to use raw strings for all regexes. – user4815162342 Mar 18 '13 at 22:11

score 0 · Answer 3 · answered Mar 18 '13 at 20:24

0

import re
fp = open('line','r')
for line in fp:
    pattern = re.match(r'[0-9]*\.(.*)',line)
    if pattern:
        print pattern.group(1)
    else:
        print line

answered Mar 18 '13 at 20:24

Harkirat

1

Variable expansion in Python regex

3 Answers3