1

I am trying to resize photos larger than specific dimensions for 100s of thousands of photos collected by a system over past 10 years. I am using find and imagemagick

I wrote this script to do it.

#!/bin/bash
ResizeSize="1080^>"
Processing=0

find . -type f -iname '*JPG' -print0 | \

while IFS= read -r -d '' image; do
    ((Processing++))
    echo Processing file: $Processing
    echo Resizing """$image""" 
    convert """$image""" -resize $ResizeSize """$image""___" 
    if [ $? -eq 0 ] ; then
      rm """$image"""
      if [ $? -eq 0 ] ; then
        mv """$image""___" """$image"""    
    else
      echo something wrong with resize
      exit 1
    fi
done

The script works on a small amount of files but it takes a long time to start with lots of files. I have tested on the command line find . -type f -iname '*JPG' -print0 vs find . -type f -iname '*JPG'. The later finds files within a few seconds but the former takes minutes before anything is found? Unfortunately the -print0 is required for dealing with filenames with special characters (which are mainly spaces in my case). How can I get this script to be more efficient?

cmdln
  • 77
  • 2
  • 8
  • 1
    Are you sure you're just not experiencing file system caching benefits in you second call? Try calling your slower `find .. -print0` twice in a row. – randomir Jul 11 '17 at 12:35
  • Triple quotes aren't a thing in shell. `"""$image"""` is just a quoted string concatenated with two empty strings on either side. – chepner Jul 11 '17 at 13:20
  • @randomir I have tested that this is not a caching thing. The result is the same every time. I should mention that I am using BSD (OSX) find. – cmdln Jul 11 '17 at 13:38
  • @chepner I was struggling to get this script to work. Somewhere I read I should be using double quotes, when that didn't work I gave triple quotes a go and now it works. I don't understand why really, but I am guessing its because my files have spaces. I did try escaping the outside quotes but that didnt work either. Well at least for the line `convert """$image""" -resize $ResizeSize """$image""___"` – cmdln Jul 11 '17 at 13:44
  • The only issue that might pop up is related to your attempt at creating a temporary file. `"$image__"` attempts to expand a parameter named `image__`, not append `__` to the value of the parameter `image`. That, however, can be accommodated by using the "long" form of parameter expansion: `"${image}__"`. – chepner Jul 11 '17 at 13:46
  • @cmdln, your "triple quotes" (`"""$var"""`) are effectively equal to proper double quotes (`"$var"`). You are simply concatenating empty strings `""` before and after the `"$var"` string. – randomir Jul 11 '17 at 13:46
  • As an aside, what version of `bash` are you using? A simple `for f in **/*.JPG; do` will be a lot simpler than dealing with `find`. – chepner Jul 11 '17 at 13:48
  • I regularly resize 60,000+ images per day - you need to use **GNU Parallel** - please see here https://stackoverflow.com/a/42670939/2836621 and here https://stackoverflow.com/a/38838907/2836621 – Mark Setchell Jul 11 '17 at 14:01

1 Answers1

0

I can not reproduce the behavior you're experiencing, but can think of two possible explanations.

First, you might be experiencing positive effects of page (disk) caching.

When you call find for the first time, it traverses files (metadata in inodes), actually reading from the data media (HDD) via kernel syscall. But kernel (transparently to find, or other applications) also stores that data in unused areas of memory, which acts as a cache. If this data is read again later, it can be quickly read from this cache in memory. This is called page caching.

So, your second call to find (no matter what output separator is used) will be a lot faster, assuming you are searching over the same files, with the same criteria.

Second, since find's output might be buffered, if your files are in many different locations, it might take some time before the actual first output to the while command. Also if the output is line-buffered, that would explain why -print0 variant takes longer to produce the first output (since there are no lines at all).

You can try running find with unbuffered output, via stdbuf command:

stdbuf -o0 find . -iname '*.jpg' -type f -print0 ...

One more thing, unrelated to this; to speed-up your find search, you might want to consider calling it like this:

find . -iname '*.jpg' -type f -print0

Here we put the -iname test before the -type test in order to avoid having to call stat(2) on every file. Even better would be to remove the -type test all together, if possible.

randomir
  • 17,989
  • 1
  • 40
  • 55
  • `stdbuf` sorts out the issue. I had to install coreutils for OSX to get it and add it to the path like this `export PATH=/usr/local/opt/coreutils/libexec/gnubin:$PATH` – cmdln Jul 11 '17 at 14:23
  • Ok, so it was the buffering. I'm glad you made it work. – randomir Jul 11 '17 at 14:36