0

How would I choose x random file names from a directory of files and then store those names ina list to use latter in the script?

I have a directory with a bunch of files in it... I would like when the script runs to read that directory and choose 3 of those file names and then assign them into a list... which latter in the script I can for loop over and do something with each entry.

Thanks

I'm a noob... please treat me as such.

(sudoCode example)

files = /path/*
random = 3 files names from the path, no repetitions

echo $random[0]
echo $random[1]
echo $random[2]
aJynks
  • 677
  • 2
  • 14
  • 27
  • 1
    This might help: [Simple method to shuffle the elements of an array in BASH shell?](https://stackoverflow.com/q/5533569/3776858) – Cyrus Aug 22 '20 at 17:10

2 Answers2

2

Well, you can do it in three parts:

  1. collect all filenames from path in temporary array;
  2. loop once for each random filename desired, choose random element from temp array;
  3. output selected filenames

In bash, that would look like:

#!/bin/bash

oifs="$IFS"                         ## save original Internal Field Separator
IFS=$'\n';                          ## set IFS to \n to accommodate spaces in filenames
a=($(find /path/to/files -type f))  ## temporary array holding all filenames
IFS="$oifs"                         ## restore original IFS
n=${#a[@]}                          ## number of files in array

for ((i=0; i<3; i++)); do           ## loop over number of random files desired
    b[$i]=${a[$((RANDOM % n))]}     ## choose random element from temp array
done

for ((i=0; i<3; i++)); do           ## loop again outputting chosen files
    echo "${b[i]}"
done

Now obviously for purpose of the example I have hardcoded 3 as the number of random filenames to choose, you can handle that any way you like, you should add a check that ((n > 0)) before using it with $((RANDOM % n)), and you can unset the temporary array if you like -- those are left to you.

If you have hundreds of thousands of files or millions of files, you way use a temp file instead of array and then use sed to pick a random line from the file.

Look things over and let me know if you have any questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • `RANDOM % n` has an uneven distribution when 65536 is not a multiple of n. To evenly distribute `RANDOM` regardless of divisor you compute the largest multiple of n that is less than 65536 for n=3: `n=3; m=$((n*65536/n))` gives `m=65535`. Then you rerun `while rnd=$RANDOM; do ((rnd – Léa Gris Aug 22 '20 at 19:22
  • 1
    Oh yes, `RANDOM` is far from perfect - with warts and all. For more than 16-bit values you can do some nutty stuff like concatenate two grabs, etc... Builtin `RANDOM` leaves much to be desired. – David C. Rankin Aug 22 '20 at 19:27
  • Given shell arithmetic does not respect division priority over multiplication: have to use `m=$((65536/n*n))` or `m=$((65536-65536%n))` – Léa Gris Aug 22 '20 at 19:36
  • Yes, depending on the refinements to `RANDOM` the OP may need, the tweak to eliminate some of the inherent bias may be warranted. On the other hand, it if's just a pick-3 and the quality of the random number produced isn't paramount, then `RANDOM` will work. If the OP was concerned about that random quality, such as for cryptographic purposes, then shell wouldn't be the appropriate tool for the job. – David C. Rankin Aug 22 '20 at 19:43
  • `a=($(find /path/to/files -type f))` won't works as intended when pathnames contain blank characters. – M. Nejat Aydin Aug 22 '20 at 20:02
  • @DavidC.Rankin You will need to adjust the `IFS` if `-print0` is to be used in the array assignment. BTW, I think the algorithm has another problem: a file may be chosen more than once by the algorithm in the code, which will cause repeated filenames to be output. – M. Nejat Aydin Aug 22 '20 at 20:10
  • That's what I did, I set `IFS` and restored it after filling the array. Good catch, thank you. – David C. Rankin Aug 22 '20 at 20:50
  • wouldn't there be a chance to get the same filename twice with this? – aJynks Aug 24 '20 at 09:58
  • Yes, technically you can loop collecting the random numbers in an array checking the previously stored numbers until you have unique randoms. – David C. Rankin Aug 24 '20 at 16:23
1

With GNU shuf util, it is a one-liner if there are only files (not subdirectories) in /path/ :

shuf -e -n3 /path/*

You can store the random pathnames in an array (named arr below) like that:

IFS=$'\n' read -d'' -r -a arr < <(shuf -e -n3 /path/*)

assuming pathnames don't contain a newline character. Or, as a slightly simpler alternative using mapfile (synonym of readarray):

mapfile -t arr < <(shuf -e -n3 /path/*)

If it is possible that pathnames contain newline characters, then:

mapfile -d '' arr < <(shuf -zen3 /path/*)

as remarked in the comments by Léa Gris. This last version will work with any valid pathname and should be preferred if you are using GNU utilities.

M. Nejat Aydin
  • 9,597
  • 1
  • 7
  • 17