-1

I am trying to write a script in Python which "cleans" a number of file-fed text like this:

for i in range(1,10):
    number = 1
    cleanText = re.sub('number.','',line).strip() 
    number = number + 1
    print cleanText

An example file would be: 1. Hello, World 2. Hello earth

What I need to do here is remove the numbering and the dots along with leading blank spaces in one fell swoop. But how on earth can I first perform a simple variable expansion?

Thank you all in advance.

Kevin
  • 74,910
  • 12
  • 133
  • 166
stratis
  • 7,750
  • 13
  • 53
  • 94
  • Why are you assigning 1 to `number` and then incrementing it by 1 each iteration? (also why is `number` there at all, it's never used) – Jason Sperske Mar 18 '13 at 20:05
  • Couldn't you just use a regular expression that matches any number? – Kevin Mar 18 '13 at 20:05
  • I think you looking for the PHP feature where a variable referenced in a string can be replace with it's value on echo. Python does not have this. There is `printf` (which is slightly different), but wouldn't apply here anyways – Jason Sperske Mar 18 '13 at 20:07
  • @Kevin Yes, I could simply do it with a regular expression. However in this case I wish to follow the substitution path which btw also works in a number of cases where regexes don't. Jason I don't know. I thought number would somehow be replaced by the value of 1. – stratis Mar 18 '13 at 20:08
  • There is this:http://stackoverflow.com/a/4840617/16959 which looks like variable expansion, but you are actually passing arguments to a `format` method that is parsing a string and substituting values as it finds them – Jason Sperske Mar 18 '13 at 20:09
  • I 've red about locals() in the past but haven't quite figured out how to properly use it. Maybe an example..? – stratis Mar 18 '13 at 20:11
  • @JasonSperske this is about the worst thing one could do to solve this incredibly simple problem. – l4mpi Mar 18 '13 at 20:12
  • @l4mpi, I agree, I'm just trying to provide some continuity if Konos5 has an idea of how Strings worked in Python based on other programming languages – Jason Sperske Mar 18 '13 at 20:13
  • @JasonSperske To be honest I am just learning the basics and coming from a bash environment it came to me as a surprise how hard it can be to perform a plain replacement. Anyway thanks for your time. I think I get the picture now. – stratis Mar 18 '13 at 20:19

3 Answers3

3

If your file format is guaranteed to be like you said:

1. Hello, World
2. Hello earth

You don't even need to use a regex, you could just use split and join:

clean_line = ' '.join(line.split(' ')[1:]).lstrip()

>>> ' '.join("1. Hello, world".split(' ')[1:])
'Hello, world'

Or, if you still wanted to do substitution, this replace-based code may work:

number = 1
for line in file_handle:
  clean_line = line.replace("%d. " % number, "").lstrip()
  number += 1
Valdogg21
  • 1,151
  • 4
  • 14
  • 24
  • or, `for number, line in enumerate(file_handle, 1): ...` – user4815162342 Mar 18 '13 at 20:15
  • @Valdogg21 Your second solution with ``replace())`` is unpythonically dumb. Your first solution is based on a good idea, but read the doc on ``strip()`` and write: ``line.lstrip().split(None,1)[1]`` for a line, and ``print '\n'.join(line.lstrip().split(None,1)[1] for line in text.splitlines(1))`` for a text – eyquem Mar 18 '13 at 22:39
2

As others said, you should simply use a regular expression that matches any number, such as r"\d" or r"\d+". However, for learning purposes, here is the answer to what you did ask.

The closest useful equivalent of "variable expansion" is the string formatting operator:

cleanText = re.sub('%d.' % number, line).strip()

You could also use str(number) + '.' to achieve the same effect. There are several more problems with your code:

  • your loop is wrong; if you're iterating over range(1, 10), then you don't need to increment number manually.

  • you probably meant range(1, 11).

  • . in regular expression syntax matches any characters; you want \..

A cleaned-up version might look like this:

cleanText = line.strip()
for i in xrange(1, 11):
    cleanText = re.sub(r'%d\.', '' , cleanText)
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • In the third line, what's the r used for before '% ? – stratis Mar 18 '13 at 20:15
  • 2
    @Konos5 It's a so-called raw string, meaning backslashes are treated as literals and are not used for escape sequences (except for escaping the enclosing quote type). See http://docs.python.org/2/reference/lexical_analysis.html#string-literals – l4mpi Mar 18 '13 at 20:17
  • 1
    @Konos5 The **r** in front of ``%d\.`` is completely unjustified. Such an **r** deactivates the influence of the backslash present in some escaped sequences OF A STRING, such as ``\n \t \a \b \\ etc``. The sequence ``\.`` is not an escaped sequence at the level of string: ``.`` is a dot and ``\.`` is a backslash and a dot. Point. ``\.`` is an escaped sequence at the level of a regex's pattern. It is more easy to use raw strings (**r** in front of) when they represent a regex's pattern intended to represent a backslash in it. Apart this case , it brings nothing to use raw strings – eyquem Mar 18 '13 at 22:01
  • 1
    The idea is that you don't have to *think* about which characters following the backslash are escape sequences. Some escape sequences are obscure or rarely used (`\a`, `\b`, `\v`), some subtly differ between languages (Emacs and GCC grok `\e`, Python doesn't), and some are interpreted differently (shells interpret `\.` as `.`, Python as `\.`). In a raw string, you just type \ knowing that it won't be interpreted as anything except the literal \ character. This is why it is a good idea to use raw strings for all regexes. – user4815162342 Mar 18 '13 at 22:11
0
import re
fp = open('line','r')
for line in fp:
    pattern = re.match(r'[0-9]*\.(.*)',line)
    if pattern:
        print pattern.group(1)
    else:
        print line