choose x random (no repeats) file names from a directory of files and store them in a list?

Question

How would I choose x random file names from a directory of files and then store those names ina list to use latter in the script?

I have a directory with a bunch of files in it... I would like when the script runs to read that directory and choose 3 of those file names and then assign them into a list... which latter in the script I can for loop over and do something with each entry.

Thanks

I'm a noob... please treat me as such.

(sudoCode example)

files = /path/*
random = 3 files names from the path, no repetitions

echo $random[0]
echo $random[1]
echo $random[2]

This might help: [Simple method to shuffle the elements of an array in BASH shell?](https://stackoverflow.com/q/5533569/3776858) — Cyrus, Aug 22 '20 at 17:10

David C. Rankin · Answer 1 · 2020-08-22T20:50:11.497

2

Well, you can do it in three parts:

collect all filenames from path in temporary array;
loop once for each random filename desired, choose random element from temp array;
output selected filenames

In bash, that would look like:

#!/bin/bash

oifs="$IFS"                         ## save original Internal Field Separator
IFS=$'\n';                          ## set IFS to \n to accommodate spaces in filenames
a=($(find /path/to/files -type f))  ## temporary array holding all filenames
IFS="$oifs"                         ## restore original IFS
n=${#a[@]}                          ## number of files in array

for ((i=0; i<3; i++)); do           ## loop over number of random files desired
    b[$i]=${a[$((RANDOM % n))]}     ## choose random element from temp array
done

for ((i=0; i<3; i++)); do           ## loop again outputting chosen files
    echo "${b[i]}"
done

Now obviously for purpose of the example I have hardcoded 3 as the number of random filenames to choose, you can handle that any way you like, you should add a check that ((n > 0)) before using it with $((RANDOM % n)), and you can unset the temporary array if you like -- those are left to you.

If you have hundreds of thousands of files or millions of files, you way use a temp file instead of array and then use sed to pick a random line from the file.

Look things over and let me know if you have any questions.

edited Aug 22 '20 at 20:50

answered Aug 22 '20 at 18:04

David C. Rankin

81,885
6
58
85

`RANDOM % n` has an uneven distribution when 65536 is not a multiple of n. To evenly distribute `RANDOM` regardless of divisor you compute the largest multiple of n that is less than 65536 for n=3: `n=3; m=$((n*65536/n))` gives `m=65535`. Then you rerun `while rnd=$RANDOM; do ((rnd – Léa Gris Aug 22 '20 at 19:22
1

Oh yes, `RANDOM` is far from perfect - with warts and all. For more than 16-bit values you can do some nutty stuff like concatenate two grabs, etc... Builtin `RANDOM` leaves much to be desired. – David C. Rankin Aug 22 '20 at 19:27
Given shell arithmetic does not respect division priority over multiplication: have to use `m=$((65536/n*n))` or `m=$((65536-65536%n))` – Léa Gris Aug 22 '20 at 19:36
Yes, depending on the refinements to `RANDOM` the OP may need, the tweak to eliminate some of the inherent bias may be warranted. On the other hand, it if's just a pick-3 and the quality of the random number produced isn't paramount, then `RANDOM` will work. If the OP was concerned about that random quality, such as for cryptographic purposes, then shell wouldn't be the appropriate tool for the job. – David C. Rankin Aug 22 '20 at 19:43
`a=($(find /path/to/files -type f))` won't works as intended when pathnames contain blank characters. – M. Nejat Aydin Aug 22 '20 at 20:02
@DavidC.Rankin You will need to adjust the `IFS` if `-print0` is to be used in the array assignment. BTW, I think the algorithm has another problem: a file may be chosen more than once by the algorithm in the code, which will cause repeated filenames to be output. – M. Nejat Aydin Aug 22 '20 at 20:10
That's what I did, I set `IFS` and restored it after filling the array. Good catch, thank you. – David C. Rankin Aug 22 '20 at 20:50
wouldn't there be a chance to get the same filename twice with this? – aJynks Aug 24 '20 at 09:58
Yes, technically you can loop collecting the random numbers in an array checking the previously stored numbers until you have unique randoms. – David C. Rankin Aug 24 '20 at 16:23

M. Nejat Aydin · Answer 2 · 2020-08-23T01:21:51.180

1

With GNU shuf util, it is a one-liner if there are only files (not subdirectories) in /path/ :

shuf -e -n3 /path/*

You can store the random pathnames in an array (named arr below) like that:

IFS=$'\n' read -d'' -r -a arr < <(shuf -e -n3 /path/*)

assuming pathnames don't contain a newline character. Or, as a slightly simpler alternative using mapfile (synonym of readarray):

mapfile -t arr < <(shuf -e -n3 /path/*)

If it is possible that pathnames contain newline characters, then:

mapfile -d '' arr < <(shuf -zen3 /path/*)

as remarked in the comments by Léa Gris. This last version will work with any valid pathname and should be preferred if you are using GNU utilities.

edited Aug 23 '20 at 01:21

answered Aug 22 '20 at 18:29

M. Nejat Aydin

9,597
1
7
17

You don't really need IFS or the read builtin for this. Consider `file_array=( $(shuf -en3 *) )` to create your array, rather than parsing redirection from process substitution. – Todd A. Jacobs Aug 22 '20 at 18:46
@ToddA.Jacobs That approach fails if the filenames contain blank characters. – M. Nejat Aydin Aug 22 '20 at 18:51
Preferrably `null` delimited: `mapfile -d '' -t arr < <(shuf -zen3 /path/*)` – Léa Gris Aug 22 '20 at 19:09
@LéaGris Yes, that will work even if the pathnames contain newline character. The `-t` option isn't needed in that case. – M. Nejat Aydin Aug 22 '20 at 19:15

choose x random (no repeats) file names from a directory of files and store them in a list?

2 Answers2