1

I have huge file archives that I host on my old-skool BBS. The [Mystic] software isn't as forgiving or capable as Linux with long-filenames OR extended characters.

Filenames should be less than 80 characters long.

Filenames should only have chars A-Z & 1-9. No "! @ # $ % ^ &", etc - nor letters with tildes or carets over them.

Here is a sample of what one collections directories looks like:

pi@bbs:/mnt/Beers4TB/opendirs/TDC19 $ ls -all
total 28
drwxrwxr-x  6 pi pi 4096 Sep 16 08:08 .
drwxrwxr-x 11 pi pi 4096 Oct  6 15:04 ..
drwxrwxr-x  2 pi pi 4096 Sep 13 20:13 ANSi
drwxrwxr-x  2 pi pi 4096 Oct  6 21:16 Drivers
drwxrwxr-x 10 pi pi 4096 Sep 16 08:12 Games
-rw-rw-r--  1 pi pi 1056 Sep 13 20:12 INTRO.TXT
drwxrwxr-x  2 pi pi 4096 Sep 16 08:08 ListsNotes

And within /subdirectories they may go 2, 3 or more deep.

Here is a sample of what some files are currently named:

pi@bbs:/mnt/Beers4TB/opendirs/TDC19/Games/Applications $ ls M*
'Mean 18 - Golf Menu [SW] (1988)(Robert J. Butler) [Sports, Golf, Utility].zip'
'Mean 18 - M18 (1988)(Ken Hopkins) [Sports, Golf, Utility].zip'
'Metaltech- Battledrome Game Editor (1994)(Sierra On-Line, Inc.) [Utility].zip'
'Might and Magic III Character Editor (1991)(Blackbeard'\''s Ghost) [Utility].zip'
'Might Magic 3 Character viewer-editor v1.1 (1991)(Mark Betz and Chris Lampton) [Editor].zip'

I have worked on some things that show promise... this echo/sed command removes SOME high chars:

echo "Might and Magic III Character Editor (1991)(Blackbeard'''s Ghost) [Utility].zip" | sed -r -e 's/\x27+//g' -e 's/[][")(]//g' -e 's/[ ]+//g'

(It renames the file:) Might_and_Magic_III_Character_Editor_1991_Blackbeards_Ghost_Utility.zip

Then, I have a command that will rename ONE entire /subdirectory, but it DOESN'T remove any characters:

for f in *.zip; do mv "${f}" "${f//[][\")( ]/_}"; done

That's good... but I have to get rid of the high characters... and, this method adds multiple spaces in filenames sometimes - which adds to that max 80 filename limit - and theres no safegaurds built in...

I worked on adding in support for going thru multiple /subdirectories, but I KNOW that my syntax is still wrong... you can, however, see what I was attempting to do:

P=$(pwd); for D in $(find . -maxdepth 1 -type d); do cd $D; for f in *.zip; do mv "${f}" "${f//[][\")( ]/_}"; cd $P; done

So, in closing - I'm open to any Linux commands that will: Remove any characters that are NOT A-Z or 1-9. Remove any extra spaces in filenames. Make sure filenames are only 80 characters long max, simply removing the last bit before the .zip (or .anything) extension. Begin in a main /directory and rename all files in each /subdirectory within the main.

Very last; I always try to put things together first... I get help from associates second - and I come to the interwebs last... but I want to UNDERSTAND how to code this exact sort of thing myself. If you have any suggestions of where to learn, that would be received well too. I tried to post this question CORRECTLY this time, pls forgive if I haven't gotten every rule correct.

pAULIE42o . . . . . /s

joanis
  • 10,635
  • 14
  • 30
  • 40
Paulie420
  • 21
  • 1
  • 1
    When both `)` and `(` are removed `1991)(Blackbeard` becomes `1991Blackbeard`. Is that what you want or should consecutive punctuation characters be replaced with a space/underscore? – oguz ismail Oct 07 '21 at 06:40
  • 1
    The Perl `rename` command will make this quite easy... https://stackoverflow.com/a/54831110/2836621 and https://stackoverflow.com/a/51641543/2836621 and https://stackoverflow.com/a/49412361/2836621 – Mark Setchell Oct 07 '21 at 07:04

4 Answers4

2

The command tr -cd deletes all characters which are not in the given list.

for f in *.zip; do
  mv "$f" "$(tr -cd 'A-Za-z0-9. \n' <<< "$f")"
done

You can use sed to add a space between adjacent parentheses:

for f in *.zip; do
  mv "$f" "$(sed 's/)(/ /g' <<< "$f" | tr -cd 'A-Za-z0-9. \n'))"
done

And you can use sed to merge multiple spaces.

for f in *.zip; do
  mv "$f" "$(sed 's/)(/ /g' <<< "$f" | tr -cd 'A-Za-z0-9. \n' | sed 's/ \+/ /g'))"
done
ceving
  • 21,900
  • 13
  • 104
  • 178
1

Edit: convert sequences of unwanted characters to one single underscore.

I assume that when you write "Filenames should only have chars A-Z & 1-9" you include lower case letters, plus the underscore to replace any sequence of unwanted characters. I also assume that you don't want leading or trailing underscores in the basenames after substitution.

Let's first write a small bash script file that takes the path of a zip file as first an only parameter ($1), separates the directory ($d) and file ($f) parts with dirname and basename, computes the new file name with tr, sed and cut, and renames the file:

$ cat /mnt/Beers4TB/opendirs/TDC19/renamer.sh
#!/usr/bin/env bash
d="$(dirname "$1")"
f="$(basename -s .zip "$1" | tr -c a-zA-Z1-9 _ | sed 's/__*/_/g' |
    cut -c 1-76 | sed 's/^_//;s/_$//')"
mv "$1" "$d/$f.zip"

Next, let's make the script executable (chmod) and use find to walk the hierarchy and call the script on each found zip file (first backup your files, just in case something goes wrong):

$ cd /mnt/Beers4TB/opendirs/TDC19
$ chmod +x renamer.sh
$ find . -type f -name '*.zip' -exec ./renamer.sh '{}' \;

(in the exec action of find {} is replaced by the found file path).

Explanations:

  • tr is used to replace all unwanted characters by underscores (_). Option -c takes the complement of the specified character set:

      $ f='!!!Mean 18 - Golf Menu [SW] ('
      $ printf '%s' "$f" | tr -c a-zA-Z1-9 _
      ___Mean_18___Golf_Menu__SW___
    
  • sed is used to replace sequences of underscores by only one underscore (s/__*/_/g), delete a leading underscore (s/^_//) and delete a trailing underscore (s/_$//):

      $ f="___Mean_18___Golf_Menu__SW___"
      $ printf '%s' "$f" | sed 's/__*/_/g'
      _Mean_18_Golf_Menu_SW_
      $ f="_Mean_18_Golf_Menu_SW_"
      $ printf '%s' "$f" | sed 's/^_//;s/_$//'
      Mean_18_Golf_Menu_SW
    
  • cut is used to clip the modified base name to 80-4=76 characters. After restoring the .zip suffix it will have 80 characters at most. The -c X-Y option of cut selects characters number X to Y:

      $ f='abcdefghi'
      $ printf '%s' "$f" | cut -c 1-4
      abcd
    
Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51
  • 1
    The cut can be replaced by `f='abcdefghi'; printf '%.4s' "$f"` – Jetchisel Oct 07 '21 at 20:00
  • 1
    @Paulie420 I edited my answer to add this but please, edit your question to make this clear. Else future readers will not understand. – Renaud Pacalet Oct 11 '21 at 15:14
  • 1
    This would be much better if we could convert sequences of unwanted character to one single underscore. Such as, instead of: XArchRogueTool(1984)(Unknown)[Utility].zip Could the output be: X_Arch_Rogue_Tool_(1984)_(Unknown)_[Utility].zip? – Paulie420 Oct 11 '21 at 15:51
  • 2
    @Paulie420 What I show does this already: only one single underscore. But you just changed again your specifications: now you want to preserve also parentheses and square brackets. You should definitely edit your question, make your specifications 100% clear and add an example of input/output (not in comments, in your question). – Renaud Pacalet Oct 11 '21 at 16:00
1

Using a while + read loop, Process Substitution and find plus mv to rename the files.


The script.

#!/usr/bin/env bash

shopt -s extglob nullglob

while IFS= read -rd '' directory; do
  if [[ -e $directory && -x $directory ]] ; then
    (
      printf 'Entering directory %s\n' "$directory"
      cd "$directory" || exit
      files=(*.zip)
      (( ${#files[*]} )) || {
        printf 'There are no files ending in *.zip here!, moving on...\n'
        continue
      }
      for file_name_with_extension in *.zip; do
        extension=${file_name_with_extension##*.}
        file_name_without_extension=${file_name_with_extension%."$extension"}
        change_spaces_to_underscore="${file_name_without_extension//+([[:space:]])/_}"
        remove_everything_that_is_not_alnum_and_under_score="${change_spaces_to_underscore//[![:alnum:]_]}"
        change_every_underscore_with_a_single_under_score="${remove_everything_that_is_not_alnum_and_under_score//+(_)/_}"
        new_file_name="$change_every_underscore_with_a_single_under_score.$extension"
        mv -v "$file_name_with_extension" "${new_file_name::80}"
      done
    )
  fi
done < <(find . ! -name . -type d -print0)

The script for creating dummy directories and files.

#!/usr/bin/env bash

mkdir -p foo/bar/baz/more/qux/sux

cd foo/ &&  touch 'Mean 18 - Golf Menu [SW] (1988)(Robert J. Butler) [Sports, Golf, Utility].zip'
cd bar/ &&  touch 'Mean 18 - M18 (1988)(Ken Hopkins) [Sports, Golf, Utility].zip'
cd baz/ && touch 'Metaltech- Battledrome Game Editor (1994)(Sierra On-Line, Inc.) [Utility].mp4'
cd more/ && touch 'Might and Magic III Character Editor (1991)(Blackbeard'\''s Ghost) [Utility].zip'
cd qux/ && touch 'Might Magic 3 Character viewer-editor v1.1 (1991)(Mark Betz and Chris Lampton) [Editor].zip'
cd sux/ && touch 'Might Magic 3 Character viewer-editor v1.1 (1991)(Mark Betz and Chris Lampton) [Editor].jpg'

Checking the directory tree with tree

tree foo/
foo/
├── bar
│   ├── baz
│   │   ├── Metaltech- Battledrome Game Editor (1994)(Sierra On-Line, Inc.) [Utility].mp4
│   │   └── more
│   │       ├── Might and Magic III Character Editor (1991)(Blackbeard's Ghost) [Utility].zip
│   │       └── qux
│   │           ├── Might Magic 3 Character viewer-editor v1.1 (1991)(Mark Betz and Chris Lampton) [Editor].zip
│   │           └── sux
│   │               └── Might Magic 3 Character viewer-editor v1.1 (1991)(Mark Betz and Chris Lampton) [Editor].jpg
│   └── Mean 18 - M18 (1988)(Ken Hopkins) [Sports, Golf, Utility].zip
└── Mean 18 - Golf Menu [SW] (1988)(Robert J. Butler) [Sports, Golf, Utility].zip

5 directories, 6 files

Using find to print the files.

find foo/ ! -name . -type f 

The output is

foo/Mean 18 - Golf Menu [SW] (1988)(Robert J. Butler) [Sports, Golf, Utility].zip
foo/bar/Mean 18 - M18 (1988)(Ken Hopkins) [Sports, Golf, Utility].zip
foo/bar/baz/more/Might and Magic III Character Editor (1991)(Blackbeard's Ghost) [Utility].zip
foo/bar/baz/more/qux/sux/Might Magic 3 Character viewer-editor v1.1 (1991)(Mark Betz and Chris Lampton) [Editor].jpg
foo/bar/baz/more/qux/Might Magic 3 Character viewer-editor v1.1 (1991)(Mark Betz and Chris Lampton) [Editor].zip
foo/bar/baz/Metaltech- Battledrome Game Editor (1994)(Sierra On-Line, Inc.) [Utility].mp4

Running the script inside the top level directory print something like:

Entering directory ./foo
mv -v Mean 18 - Golf Menu [SW] (1988)(Robert J. Butler) [Sports, Golf, Utility].zip Mean_18_Golf_Menu_SW_1988Robert_J_Butler_Sports_Golf_Utility.zip
Entering directory ./foo/bar
mv -v Mean 18 - M18 (1988)(Ken Hopkins) [Sports, Golf, Utility].zip Mean_18_M18_1988Ken_Hopkins_Sports_Golf_Utility.zip
Entering directory ./foo/bar/baz
There are no files ending in *.zip here!, moving on...
Entering directory ./foo/bar/baz/more
mv -v Might and Magic III Character Editor (1991)(Blackbeard's Ghost) [Utility].zip Might_and_Magic_III_Character_Editor_1991Blackbeards_Ghost_Utility.zip
Entering directory ./foo/bar/baz/more/qux
mv -v Might Magic 3 Character viewer-editor v1.1 (1991)(Mark Betz and Chris Lampton) [Editor].zip Might_Magic_3_Character_viewereditor_v11_1991Mark_Betz_and_Chris_Lampton_Editor.
Entering directory ./foo/bar/baz/more/qux/sux
There are no files ending in *.zip here!, moving on...

  • Remove the echo if you're satisfied with the output in order for mv to rename the files.

Without the echo the output is something like:

Entering directory ./foo
renamed 'Mean 18 - Golf Menu [SW] (1988)(Robert J. Butler) [Sports, Golf, Utility].zip' -> 'Mean_18_Golf_Menu_SW_1988Robert_J_Butler_Sports_Golf_Utility.zip'
Entering directory ./foo/bar
renamed 'Mean 18 - M18 (1988)(Ken Hopkins) [Sports, Golf, Utility].zip' -> 'Mean_18_M18_1988Ken_Hopkins_Sports_Golf_Utility.zip'
Entering directory ./foo/bar/baz
There are no files ending in *.zip here!, moving on...
Entering directory ./foo/bar/baz/more
renamed 'Might and Magic III Character Editor (1991)(Blackbeard'\''s Ghost) [Utility].zip' -> 'Might_and_Magic_III_Character_Editor_1991Blackbeards_Ghost_Utility.zip'
Entering directory ./foo/bar/baz/more/qux
renamed 'Might Magic 3 Character viewer-editor v1.1 (1991)(Mark Betz and Chris Lampton) [Editor].zip' -> 'Might_Magic_3_Character_viewereditor_v11_1991Mark_Betz_and_Chris_Lampton_Editor.'
Entering directory ./foo/bar/baz/more/qux/sux
There are no files ending in *.zip here!, moving on...

This would be much better if we could convert sequences of unwanted character to one single underscore. Such as, instead of: XArchRogueTool(1984)(Unknown)[Utility].zip Could the output be:

X_Arch_Rogue_Tool_(1984)_(Unknown)_[Utility].zip?

Change the value of remove_everything_that_is_not_alnum_and_under_score

from:

remove_everything_that_is_not_alnum_and_under_score="${change_spaces_to_underscore//[![:alnum:]_]}"

to

remove_everything_that_is_not_alnum_and_under_score="${change_spaces_to_underscore//[![:alnum:]_()\[\]]}" 

To exclude parenthesis ( ) and brackets [ ]


Add the code below the line where change_every_underscore_with_a_single_under_score is at.

insert_underscore_in_between_parens="${change_every_underscore_with_a_single_under_score//')('/')_('}"

Change the value of new_file_name= to "$insert_underscore_in_between_parens.$extension"

new_file_name="$insert_underscore_in_between_parens.$extension"

Pointing the directory to the script requires a bit of modification.

Add the code below after the shebang

directory_to_process="$1"

if [[ ! -e "$directory_to_process" ]]; then
  printf >&2 '%s no such file or directory!\n' "$directory_to_process"
  exit 1
elif [[ ! -d "$directory_to_process" ]]; then
  printf >&2 '%s does not appear to be a directory!\n' "$directory_to_process"
  exit 1
fi

Then change the . from find

find "$directory_to_process" ! -name . -type d -print0

The new script.

#!/usr/bin/env bash

directory_to_process="$1"

if [[ ! -e "$directory_to_process" ]]; then
  printf >&2 '[%s] no such file or directory!\n' "$directory_to_process"
  exit 1
elif [[ ! -d "$directory_to_process" ]]; then
  printf >&2 '[%s] does not appear to be a directory!\n' "$directory_to_process"
  exit 1
fi

shopt -s extglob nullglob

while IFS= read -rd '' directory; do
  if [[ -e $directory && -x $directory ]] ; then
    (
      printf 'Entering directory %s\n' "$directory"
      cd "$directory" || exit
      files=(*.zip)
      (( ${#files[*]} )) || {
        printf 'There are no files ending in *.zip here!, moving on...\n'
        continue
      }
      for file_name_with_extension in *.zip; do
        extension=${file_name_with_extension##*.}
        file_name_without_extension=${file_name_with_extension%."$extension"}
        change_spaces_to_underscore="${file_name_without_extension//+([[:space:]])/_}"
        remove_everything_that_is_not_alnum_and_under_score="${change_spaces_to_underscore//[![:alnum:]_()\[\]]}"
        change_every_underscore_with_a_single_under_score="${remove_everything_that_is_not_alnum_and_under_score//+(_)/_}"
        insert_underscore_in_between_parens="${change_every_underscore_with_a_single_under_score//')('/')_('}"
        new_file_name="$insert_underscore_in_between_parens.$extension"
        echo mv -v "$file_name_with_extension" "${new_file_name:0:80}"
      done
    )
  fi
done < <(find "$directory_to_process" ! -name . -type d -print0)

Now you give the directory as an argument to the script. e.g.

./script.sh foo/

Or an absolute path.

./script.sh /path/to/foo

If you add the script to your PATH and make it executable then you can.

script.sh /path/to/foo

Assuming your script name is script.sh and the directory you want to process is named foo


Jetchisel
  • 7,493
  • 2
  • 19
  • 18
  • 1
    Wow, wow, w0w, guys. I was gonna respond to the very first poster - and then there was 3 others that did just as good a job explaining things in other ways. This is really helpful; and finally I am GRASPING the how and why. I give ya'll many thanks - appreciate it. With this info I can get something patched up that will work perfectly for me. – Paulie420 Oct 11 '21 at 13:49
  • 1
    Glad to be of help. I'm pretty sure the other answers are as good as mine. Now if and when you decided to pick an answer/solution, please have a look at https://stackoverflow.com/help/someone-answers – Jetchisel Oct 11 '21 at 20:31
1

I recommend you to punycode the names, but I have no proper way (adequate answer) to reduce the lengths of the files to fit in 80 characters long (the punycode process is completely reversible and maintains the ascii codes in their places, giving you a readable file name, and it can be modified to consider the character case of the name characters)

For the extra length encoding, I'd use some kind of fixed length hash function to avoid name clashes, but this process is not reversible at all, you'll be losing part of the name. You need to think a bit on your possibilities to be able to help you in this.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31