0

I want to select from a file random lines/units but where the units are consisted of 2 lines.

For example a file looks like this

Adam
Apple
Mindy
Candy
Steve
Chips
David
Meat
Carol
Carrots

And I want to randomly subselect lets say 2 units group

For example

Adam
Apple
David
Meat

or

Steve
Chips
Carol
Carrots

I've tried using shuf and sort -R but they only shuffle 1 lines. Could someone help me please? Thank you.

palansuya
  • 7
  • 3

3 Answers3

2

You could do it with shuf by joining the lines before shuffling (that might not be a bad idea for a file format in general, if the lines describe a single item):

$ < file sed -e 'N;s/\n/:/' | shuf | head -1 | tr ':' '\n'
Carol
Carrots

The sed loads two lines at a time, and joins them with a colon.

ilkkachu
  • 6,221
  • 16
  • 30
  • You don't need the p, just remove the -n flag. Also if your sed supports `\n` then you don't need the -e either and it accepts filenames as arg so you don't need `<`. It could be written as `sed 'N;s/\n/:/' file` – 123 Jun 13 '17 at 22:08
0

Pick a random number in the correct range, ensure that it is odd (if desired), then use sed to print the 2 lines:

$ a=$(expr $RANDOM % \( $(wc -l < input) / 2 \) \* 2 + 1)
$ sed -n -e ${a}p -e $((a+1))p input
William Pursell
  • 204,365
  • 48
  • 270
  • 300
0

Rather than selecting lines to print, you could walk the file and print each "unit" with a particular probability. For example, to print (roughly) 10% of the "units" in the file, you could do:

awk 'BEGIN{srand()} NR%2 && (rand() < .1) {print; getline; print}' input
William Pursell
  • 204,365
  • 48
  • 270
  • 300