4

Sorry, but I can't figure this out from the Python documentation or any of the stuff I've found from Google.

So, I've been working on renaming files with code 99% from one of the awesome helpers here at StackOverflow.

I'm working on putting together a renaming script that (and this is what I got help with from someone here) works with the name (not the extension).

I'm sure I'll come up with more replacements, but my problem at the moment is that I can't figure out how to do more than one re.sub. Current Code (Replaces dots with spaces):

import os, shutil, re

def rename_file (original_filename):
    name, extension = os.path.splitext(original_filename)
    #Remove Spare Dots
    modified_name = re.sub("\.", r" ", name)
    new_filename = modified_name + extension
    try:
        # moves files or directories (recursively)
        shutil.move(original_filename, new_filename)
    except shutil.Error:
        print ("Couldn't rename file %(original_filename)s!" % locals())
[rename_file(f) for f in os.listdir('.') if not f.startswith('.')]

Hoping to also

re.sub("C126", "Perception", name)
re.sub("Geo1", "Geography", name)

Also, it'd be awesome if I could have it capitalize the first letter of any word except "and|if"

I tried

modified_name = re.sub("\.", r" ", name) && re.sub(... 

but that didn't work; neither did putting them on different lines. How do I do all the subs and stuff I want to do/make?

user
  • 555
  • 3
  • 6
  • 21
  • 1
    I'd suggest, if you're going to be using the same regexp many times, to compile it once, and re-use it all you like. That will make your script much faster. Take a look at [re.compile()](http://docs.python.org/library/re.html#re.compile) – El Barto Mar 11 '12 at 00:38
  • 6
    You've got some correct answers below, but I'd point out that there's absolutely no reason to use regex to do these substitutions. Plain `name = name.replace('.', ' ')` etc would do the job. – Daniel Roseman Mar 11 '12 at 00:40
  • Safety hint: Always use raw strings with regexps containing a backslash. You don't actually need it with "\.", but it's standard defensive programming; your next RE will need it. – alexis Mar 11 '12 at 11:39
  • Just noting that this is a [continuation of a previous question.](http://stackoverflow.com/questions/9577135/renaming-files-according-to-a-set-of-rules) – Li-aung Yip Mar 13 '12 at 05:20

5 Answers5

11

Just operate over the same string over and over again, replacing it each time:

name = re.sub(r"\.", r" ", name)
name = re.sub(r"C126", "Perception", name)
name = re.sub(r"Geo1", "Geography", name)

@DanielRoseman is right though, these are literal patterns that don't need regexes to be described/found/replaced. You can use timeit to demonstrate plain old replace() is preferrable:

In [16]: timeit.timeit("test.replace('asdf','0000')",setup="test='asdfASDF1234'*10")
Out[16]: 1.0641241073608398

In [17]: timeit.timeit("re.sub(r'asdf','0000',test)",setup="import re; test='asdfASDF1234'*10")
Out[17]: 6.126996994018555
Eduardo Ivanec
  • 11,668
  • 2
  • 39
  • 42
  • @Eduardo, you should use a raw string for r"\.". It doesn't matter here, but it's important to use in an example. (Stupid SO wouldn't let me do a one-letter edit...) – alexis Mar 11 '12 at 11:36
5

Well, each time you call re.sub(), it returns a new, changed string. So, if you want to keep modifying each new string, keep assigning the new strings to the same variable name. Essentially, don't think of yourself as modifying the same string over and over - instead, think of yourself as modifying a new String each time.

Example: If you're using the string "lol.Geo1",

newString = re.sub("\.", r" ", originalString)

will return the string "lol Geo1", and assign it to newString. Now, if you want to change that new string, do your next substitution and it will return another string, which you can put to "newString" again -

newString = re.sub("Geo1", "Geography", newString)

Now, newString evaluates to "lol Geography". With each substitution, you are creating a new string, not the same one. That's why

modified_name = re.sub("\.", r" ", name) && re.sub(... 

didn't work - "re.sub(".", r" ", name)" will return one string, "re.sub(...)" will return another string, etc., etc. - each of those strings only having their individual substitution on the original string, like this:

modified_name = "lol Geo1" && "lol.Geography"...

So, to get it to work, follow the other poster's suggestions - just keep repeating the assignment with each substitution you want, assigning the substituted newString to itself, until you've finished all of your substitutions.

Hopefully, that's a clear explanation. Feel free to ask questions. :)

Marshall Conover
  • 855
  • 6
  • 24
  • What am I doing wrong? `import os, shutil, re def rename_file (original_filename): name, extension = os.path.splitext(original_filename) #Remove Spare Dots modified_name = re.sub("\.", r" ", name) modified_name = re.sub("\-", r" ", modified_name) new_filename = modified_name + extension try: # moves files or directories (recursively) shutil.move(original_filename, new_filename) except shutil.Error: print ("Couldn't rename file %(original_filename)s!" % locals()) [rename_file(f) for f in os.listdir('.') if not f.startswith('.')]`` – user Mar 11 '12 at 01:08
  • @RobinHood could you put that in your individual post, or in a [pastebin](http://pastebin.com/)? It's hard to check out python without the indentation :P – Marshall Conover Mar 11 '12 at 01:13
  • Yea, I'm sorry, I dunno how to put it in the corrent format on here. Pastebin: http://pastebin.com/mEBKr5m4 – user Mar 11 '12 at 01:26
  • @RobinHood - I got it indented, and it worked for me - I put the script in its own directory, made a file "lol.g-eo", ran the script from idle, and it wrote back "lol g eo". How are you trying to run the script? – Marshall Conover Mar 11 '12 at 01:26
  • I cd to to the directory and then I "python /root/rename.py" The example I gave works, but if I add on the other lines it fails. Has it been tried as part of this whole script? – user Mar 11 '12 at 01:31
  • @RobinHood - the indentation in that pastebin is incorrect; the second re.sub should be on the same indent as the first, [like this](http://pastebin.com/yyMYJ4bR). Is python complaining about an indent error? If so, that's the issue. – Marshall Conover Mar 11 '12 at 01:45
  • Thanks man! I would have sworn that's what I was doing!! You're fantastic. I'll look around and (if I can't find something) be asking abou those capitalization things soon. =D – user Mar 11 '12 at 02:10
1
modified_name = re.sub("\.", r" ", name)
modified_name = re.sub("C126", "Perception", modified_name)
modified_name = re.sub("Geo1", "Geography", modified_name)

Pass the output of one as the input of the next.

Amber
  • 507,862
  • 82
  • 626
  • 550
0

you could also stack one over the other, but this consider less readable

name = re.sub(r"Geo1", "Geography", 
              re.sub(r"C126", "Perception", 
                     re.sub(r"\.", r" ", name)
                     )
              )
Alen Paul Varghese
  • 1,278
  • 14
  • 27
0

Just put a vertical bar :)

name = re.sub(r"\.|C126|Geo1", r" ", name)