Rename pdf files recursively if their names begin with a year

Question

Problems similar to mine were asked about and answered (for example here). This is a bit more specific:

I have a lot of pdf files in many dedicated folders and only some of them have their name like this YEAR_Author(s)Name(s).pdf (e.g., 2011_Smith.pdf, 2011_SmithWesson.pdf), others are named like this Author(s)Name(s)2011.pdf (e.g., Smith2011.pdf, SmithWesson2011.pdf). The latter is my preferred name format. I would like to rename all my files to that format. Following the previous example:

2011_Smith.pdf -> Smith2011.pdf
2011_SmithWesson.pdf -> SmithWesson2011.pdf

Is there a way to change such filenames recursively (and smartly). Do you need to brew rename to be able to do that? I am on macOS.

Your question is not very clear... some of your files are named `YEAR_AuthorName.pdf` - how are the others named? You would prefer `AuthorName_YEAR.pdf` or `AuthorNameYEAR.pdf` - given that you're asking, just say which you want. Maybe add an example too. And yes, the `rename` from `homebrew` is ideal, a.k.a. Perl `rename` — Mark Setchell, Dec 04 '21 at 17:31
I've done 100+ answers on that if you want to look through them https://stackoverflow.com/search?tab=newest&q=user%3a2836621%20rename — Mark Setchell, Dec 04 '21 at 17:32
Hi @Alexander, There is a whole hierarchy of folders, on specific topics etc. Does this make sense? — striatum, Dec 04 '21 at 17:44
It wasn't clear from the question, but yes, that makes sense. — Alexander, Dec 04 '21 at 17:53

Mark Setchell · Accepted Answer · 2021-12-09T10:10:22.850

Ok, I had a few moments to have a think about this and tried to do it using the rusty old tools that Apple ships (bash v3.3, sed the ancient, and antiquated find) rather than introducing new dependencies.

Let's do this incrementally:

find the files
iterate over them
work out the changes
check the changes and make them

First thing is to identify your files. I think the following should find all PDFs that start with 4 digits followed by an underscore:

find /Users/YOURUSER  -iregex ".*/[0-9][0-9][0-9][0-9]_.*\.pdf" 2> /dev/null

If that's correct, let's pipe it into a while loop and check that looks correct:

find /Users/YOURUSER  -iregex ".*/[0-9][0-9][0-9][0-9]_.*\.pdf" -print0 2> /dev/null | 
   while IFS= read -r -d $'\0' path; do
      print "$path"
   done

Now see if we can twiddle your year off the front and onto the back:

#!/bin/bash

find /Users/YOURUSER  -iregex ".*/[0-9][0-9][0-9][0-9]_.*\.pdf" -print0 2> /dev/null | 
   while IFS= read -r -d $'\0' path; do
      d=$(dirname  "$path")
      f=$(basename "$path")
      # Strip trailing PDF extension case insensitively and swap 4-digit year followed by underscore, space or dash, from front to back
      f=$(echo "$f" | sed -E 's/\.pdf$//i; s/^([0-9]{4})[_ -](.*)/\2\1/')
      new="${d}/${f}.pdf"
      echo "$path becomes"
      echo "-> $new"
      # mv "$path" "$new"
   done

If that all looks good with your files on your system, make a TimeMachine backup first, then uncomment the penultimate line by removing the # at the start and run again.

In case you are unfamiliar with sed, \1 refers to whatever was captured in the first set of (...) on the left side of the substitution and \2 refers to whatever was captured in the second set of (...) - they are "capture groups".

Rename pdf files recursively if their names begin with a year

1 Answers1