1

I'm struggling immensely with getting a nested for loop to work for this. The data set I am working with is very large (a little over a million files).

I was looking at a nested for loop but it seems unstable.

count=0
for dir in $(find "$sourceDir" -mindepth 1 -maxdepth 1 -type d)
do
        (
                mkdir -p "$destDir/$dir"
                for file in $(find . -type f)
                do
                        (
                        if [ $((count % 3)) -eq 2 ]
                        then
                                cp -prl "$file" $destDir/$dir
                        fi
                        ((count ++))
                        )
                done
        )
        ((count++))
done

^^ this is only going into the last directory and finding the 3rd file. I need it to enter every directory and find the third file

I've thought of breaking this up into chunks and running several scripts instead of just one to make it more scalable.

rptatum
  • 21
  • 3
  • Take the time to copy/paste your code one line at a time and look at the output before executing the next line of code. I think you need a `cd $dir` after the `mkdir`, but your description makes it hard to be sure. If you do add `cd $dir`, you'll need a `cd ..` to "get out of that sub dir just before the inner `done`. Also, I don't think you need the `(`,`)` pair on that inner loop. (I could be wrong). Good luck. – shellter Jan 16 '23 at 23:07
  • 1
    Put a valid [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) and paste your script at https://shellcheck.net for validation/recommendation. That said if you could explain more about what the script should be doing and add a simple input and desired output that should help others to understand instead of just posting broken code. – Jetchisel Jan 17 '23 at 01:16
  • 1
    Look at https://mywiki.wooledge.org/BashFAQ/001, look at the `find` with `-print0` for your loops. You do not need `( )` inside `do ... done`. *"third file"*, based on what? Alphabetical order? date? Something else? For such a large number of directories and files, it might be best to switch to something better for performance (i.e. C, perl, python). – Nic3500 Jan 17 '23 at 01:32
  • @Nic3500 That's such a good resource - thank you for sharing. It's technically not every third file, it's every 720th based on alphabetical order, but due to testing environment limitations it was difficult to recreate that. It's thankfully just .tif files, though – rptatum Jan 17 '23 at 05:52

1 Answers1

1

I was able to figure out the answer thanks to the commenters!! My input was a folder with 4 sub folders and within each of those 4 subfolders, there are 12 files.

My ideal output was having every 3rd file (starting with three) hardlinked at an external location sorted within their subdirectories... so something like this - subdirA (3rdfile hardlink,6thfile hardlink,9thfile hardlink,12thfile hardlink) subdirB (3rdfile hardlink,6thfile hardlink,...)

... and so on!!

Here is what got it to work:

#!/bin/bash

for d in *;
do
        echo $d
        mkdir Desktop/testjan16/$d

#### loops through each file in the folder and hardlinks every third file (starting w 3) to the appropriate directory 

        for f in `find ./$d -type f | sort | awk 'NR %3 == 0'`; do ln $f Desktop/testjan16/$d; done
done
rptatum
  • 21
  • 3