0

I have a table file and I want to shuffle the rows of specific columns in Perl.

For example, I have this array:

a 1
b 2
c 3
d 4
e 5
f 6

and I want to shuffle the second column to get something like this:

a 2
b 1
c 3
d 4
e 5
f 6
Jean Paul
  • 1,439
  • 18
  • 21

3 Answers3

2

Using List::Util::shuffle might be a good idea. I used a Schwartzian transform to create a list of random numbers, sort them, and insert the column data based on the array index.

use strict;
use warnings;
use feature 'say';

my @col;
while (<DATA>) {
    push @col, [ split ];
}
my @shuffled = map { $col[$_->[0]][1] }      # map to @col values
               sort { $a->[1] <=> $b->[1] }  # sort based on rand() value
               map { [ $_, rand() ] }        # each index mapped into array of index and rand()
               0 .. $#col;                   # list of indexes of @col
for my $index (0 .. $#col) {
    say join " ", $col[$index][0], $shuffled[$index];
}
__DATA__
a 1
b 2
c 3
d 4
e 5
f 6
TLP
  • 66,756
  • 10
  • 92
  • 149
  • 1
    Nice solution. It uses a sort which may be more CPU intensive than a simple shuffle (I think it is O[n²] vs O[n]) but it works. This script would also need to be adapted to work on any table text file. – Jean Paul Nov 27 '20 at 17:12
  • @JeanPaul Thank you, its mainly just a demonstration of how cool Schwartzian transforms are. – TLP Nov 27 '20 at 17:31
1

I can use this script to do the job:

#!/usr/bin/env perl

use strict;
use warnings;

use List::Util qw/shuffle/;

my @c = split /,/, $ARGV[0];
$_-- for @c;
shift;

my @lines;
my @v;
while ( <> ) {
    my @items = split;
    $v[$.-1] = [@items[@c]];
    $lines[$.-1] = [@items];
}

my @order = shuffle (0..$#lines);

for my $l (0..$#lines) {
    my @items = @{ $lines[$l] };
    @items[@c] = @{ $lines[$order[$l]] }[@c];
    print "@items\n";
}

This script uses List::Util which is part of Perl core modules since perl v5.7.3: corelist List::Util

It can be launched with perl shuffle.pl 2 test.txt

Jean Paul
  • 1,439
  • 18
  • 21
  • Thanks for pointing this out. I know that Perl is not the only language to use 0 as first index for arrays, but I find it unnatural, especially that Perl is not C, and an array is more than an pointer to a slot in the memory. – Jean Paul Nov 24 '20 at 20:42
  • I don't know, new Perl versions arrived but the default one used in Unix systems did not change for a long time, I'm not sure it will happen soon. – Jean Paul Nov 24 '20 at 20:50
  • I don't know why the comment by the other guy above was removed but I looked more closely at the documentation about `$[` (https://perldoc.perl.org/perlvar#$%5B), and I saw that it no longer works starting from Perl v5.30.0 even without doing `use v5.16`, so I had to remove the use of `$[=1` from my script :(. – Jean Paul Nov 27 '20 at 16:38
-1

Demo code for a case when external modules are not permitted.

use strict;
use warnings;
use feature 'say';

my %data;

while( <DATA> ) {
    my($l,$d) = split;
    $data{$l} = $d;
}

say '- Loaded --------------------';
say "$_ => $data{$_}" for sort keys %data;

for( 0..5 ) {
    @data{ keys %data} = @{ shuffle([values %data]) };
    say "-- $_ " . '-' x 24;
    say "$_ => $data{$_}" for sort keys %data;
}

sub shuffle {
    my $data = shift;
    
    my($seen, $r, $i);
    my $n = $#$data;
    
    for ( 0..$n ) {
        do {
            $i = int(rand($n+1));
        } while defined $seen->{$i};
        $seen->{$i} = 1;
        $r->[$_] = $data->[$i];
    }
    
    return $r;
}

__DATA__
a 1
b 2
c 3
d 4
e 5
f 6

Output

- Loaded --------------------
a => 1
b => 2
c => 3
d => 4
e => 5
f => 6
-- 0 ------------------------
a => 5
b => 4
c => 2
d => 6
e => 1
f => 3
-- 1 ------------------------
a => 3
b => 6
c => 2
d => 4
e => 1
f => 5
-- 2 ------------------------
a => 4
b => 5
c => 6
d => 1
e => 3
f => 2
-- 3 ------------------------
a => 6
b => 4
c => 1
d => 2
e => 3
f => 5
-- 4 ------------------------
a => 3
b => 4
c => 6
d => 5
e => 1
f => 2
-- 5 ------------------------
a => 6
b => 5
c => 3
d => 4
e => 2
f => 1
Polar Bear
  • 6,762
  • 1
  • 5
  • 12
  • 1
    You can't use hash keys to store table data. What if there are duplicates? You can't even retain the original order of column 1 because hashes are not ordered. Just because they happen to be unique and sortable in the sample data doesn't mean they will be in a real scenario. A shuffle function with a while loop that waits for a free array index will be extremely inefficient for larger arrays. Imagine a file with 10000 lines, with 9999 numbers taken, each roll will have a 1/10000 chance to hit the last array index. – TLP Nov 25 '20 at 18:16
  • @TLP - OP should better describe the problem on expected input data. Your points are valid on the algorithm of index computation. Hash was used for demonstration purpose only, instead two arrays can be implemented for storage of elements. – Polar Bear Nov 26 '20 at 02:12
  • 1
    The table I provided in my question was just an example, I would want a script capable to work on any table text file as input. – Jean Paul Nov 27 '20 at 16:48
  • Also note that the shuffling algorithm you proposed is not optimal since it can take many time to find the last indices. See the way it is implemented in List::Util where the complexity is reduced to O[n]: https://stackoverflow.com/a/5168324/4374441 – Jean Paul Nov 27 '20 at 22:08
  • @JeanPaul -- what would you do if no modules allowed in the system installed due security reason. Did you took in an account UTF-8 keys and values for your tables? At least you must describe the problem in more details. There is not one solution to fit all possible cases (such solution grows exponentially in size and becomes inefficient). – Polar Bear Nov 28 '20 at 04:31
  • One solution would be to copy the implementation of `List::Util::shuffle` which is very short. For UTF-8 characters I think it is too specific, I should have written any table with normal values inside, like one we encounter in normal life. – Jean Paul Nov 30 '20 at 09:22