Remove A String of Numbers or Text From The Beginning of A Filename

Question

On MacOS Sierra (MacBook Pro), how would one remove random strings of numbers from the beginning of .pdf filenames as a batch so that they can be viewed in alphabetical order? For e.g. this is a sample of how these files appear in Finder with the numbers affecting the alphabetical ordering:

22169203 The Greatest Trade Ever by Gregory Zuckerman Excerpt.pdf

22681256 Sample PDF Getting the Money.pdf

24225401 Indian Film Industry.pdf

24309156 Start and Run Your Own Coffee Shop and Lunch Bar.pdf

24375874 5 Steps To Training in Classical Music Riyaz.pdf

24538682 100 Verbal Signals.pdf

24861279 100 Greatest Songs.pdf

24975456 Appointment Book Preview.pdf

25169832 How to Start a Business for Free.pdf

25283738 Building Modern PR Campaigns.pdf

26672829 Biggest Stock Market Plays in the World.pdf

26852793 Free eBook on Secret Practical Guide for Stocks Beginners by Nnadi Jane.pdf

27012881 The Value and Price of Food An excerpt from Terra Madre by Carlo Petrini.pdf

27114721 Social Media Marketing Services.pdf

27881968 Introduction 3.pdf

28097572 Film PDF.pdf

Gerard · Accepted Answer · 2017-12-21T01:03:24.943

0

This little shell script will rename the files removing the numbers at the front of the name using the regex defined at the top. Note that this script should be run in the same directory as the PDFs or you will need to change the directory in the for loop.

#!/bin/bash
regex="([0-9]*.)(([a-zA-Z0-9]*|.*)*\.pdf)"
for filename in *.pdf; do
    if [[ $filename =~ $regex ]]
    then
    mv "$filename" "${BASH_REMATCH[2]}"
  fi  
done

Explanation

A detailed breakdown of the regular expression used can be found here. It explains things more cleanly than I could in this post.

The mv command moves the first argument (in this case the PDF with its original name) to the destination specified in the final argument (in this case the second capturing group of the regular expression). BASH_REMATCH is an array which stores the results of the regular expression operation. In this case we used the 2nd index which stores the name of the file minus the prefixed numbers. This article explains bash regular expressions in more detail.

edited Dec 21 '17 at 01:03

answered Dec 20 '17 at 20:11

Gerard

101
8

Thank you Gerard. I have just tested this but there are no changes to the filenames: the numbers are still there at the start of the filename. I have moved the .pdfs to the same directory as the default terminal directory. Are there any variables I need to changed in the script you have provided? – Nat Dec 20 '17 at 21:01
@nat, Sorry about that. I tested the original script using Windows 10's bash shell and the script worked. However, after testing using Git Bash, I realized that the regex in Git bash (and consequently normal linux/mac bash) supported less special operators. I updated the answer with a new regex that works in both git bash and windows bash. – Gerard Dec 20 '17 at 22:51
no need to apologise - excited to be working through a solution with you. I am getting the following error in the terminal: wiilds-MBP:~ wiild$ /Users/wiild/RemoveNumbersPDF ; exit; /Users/wiild/RemoveNumbersPDF: line 4: : command not found /Users/wiild/RemoveNumbersPDF: line 5: : command not found /Users/wiild/RemoveNumbersPDF: line 6: : command not found /Users/wiild/RemoveNumbersPDF: line 7: : command not found logout Saving session... ...copying shared history... ...saving history...truncating history files... ...completed. – Nat Dec 20 '17 at 23:04
I have tried this with your revised script and also revised the second line to "for filename in ./*.pdf; do" in case that was an issue, but to no avail. – Nat Dec 20 '17 at 23:06
@nat Hmm that's odd...It looks like your terminal is misinterpreting the script. Here's how I would run the script to make just to make sure we are on the same page. 1) Save the snippet in the directory with all of the PDFs you want to rename as `rename.sh`. 2) Open a new terminal window and cd to that same directory. 3) call the script using `./rename.sh` and let me know what happens. Make sure to keep the `#!/bin/bash` comment at the top as well. – Gerard Dec 20 '17 at 23:15
I tried that but the same error pops up. Interestingly enough, the first script you sent didn't result in any errors - terminal seemed to read it fine, it just didn't rename the files. This time it can't seem to read all the lines, which is strange. I followed the steps in here as well keeping the #!/bin/bash command intact: https://www.hastac.org/blogs/joe-cutajar/2015/04/21/how-make-simple-bash-script-mac and I am running everything from the same directory. The terminal, the shell script and the .pdf files are in the same directory. Sorry for the trouble, and I appreciate your help. – Nat Dec 20 '17 at 23:44
1

Gerard - we have a result, success! THANK YOU. The issue was in the final line "fi" which should have been "fi " with spaces I believe - either way, that solved the problem in the end. I'm not sure how to edit the above script to do that, so feel free to do the honours! As I'm a total newbie to this, could you possible explain briefly how the regex works in this code i.e. how does the code know to remove only the string of numbers no matter the length before the filename even if the filename includes numbers for e.g. 587487587 48 Laws of Power.pdf? Thank you once again for everything. Nat – Nat Dec 21 '17 at 00:30
Sweet, I'll edit the script! The regex really has two parts `([0-9]*.)` and `(([a-zA-Z0-9]*|.*)*\.pdf)`. The first part matches for the numbers in the front of the PDF and the following space. `[0-9]` looks for any single digit 0-9, and the `*` makes it match as many times as needed, including 0 times. the `.` character matches any single character which i used to match the space after the prefix numbers. The second part matches any letter or number a-z, A-Z, 0-9 zero or more times. OR a space (using the `.` again). And the `\.pdf` at the end is hardcoded to match the `.pdf` extension. – Gerard Dec 21 '17 at 00:48
A more thorough explanation can be found [here](https://regex101.com/r/KqpMKX/2). Its what I use for testing/designing regular expressions and gives very granular explanations. – Gerard Dec 21 '17 at 00:50
I added the explanation directly onto the answer so that you and anyone else can find it more easily. Glad to help Nat. – Gerard Dec 21 '17 at 01:07

Remove A String of Numbers or Text From The Beginning of A Filename

1 Answers1