1

I have a matrix that's 490 rows (features; F1..F490) and 350 columns (350 samples; s1..s350). The first columns look like this:

Drug    T   T   T   C   T
Sample  s1  s2  s3  s4  s5 .....
Pair    16  81 -16  32 -81 .....
Cond    B   D    B   B  D  .....
F1      34  23   12     9  .....
F2      78       11  87 10 .....
...

(there are missing data, it's normal).

There are 2 conditions; B and D. There are 2 drugs (T and C). The samples are paired. So for example, s1 and s3 are paired because their Pair value is the same (in absolute value).

What I'm trying to do, is to permute the drugs labels 1000 times while preserving the information on the pairing (Pair value). So, a pair should always have the same condition (B in this case) and the same Pair value (16 and -16 in this case). Also, they have to have the same drug label. Example; s1 and s3 are a pair; the have the same Pair value, are both B and have both the drug label T.

So 1 of the 1000 permuted files should look something like this for example:

Drug    C   T   C   T   T
Sample  s1  s2  s3  s4  s5 .....
Pair    16  81 -16  32 -81 .....
Cond    B   D    B   B  D  .....
F1      34  23   12     9  .....
F2      78       11  87 10 .....
...

I don't mind if the samples are not in order.

I've tried permute and sample (in R), but I can't seem to find a way to do it while including the conditions described above.. I'm sorry if this is obvious..

I want to use these permutated files (n=1000) for a downstream analysis that I already coded.

Thank you very much for your input.

  • wouldnt you just need to sample with replacement the Drug feature and keep the rest of data frame same??. simply using `sample(c("T","C"), 351, replace = TRUE)` for every permutation. – Mankind_008 Jul 05 '18 at 21:02
  • but wouldn't the integrity of the pairing be compromised? Because every pairing has to have the same drug label (eg; s1 and s3 have to have either T or C both, but not T and C). Sorry it wasn't clear, I added it in the original question. – snowy_squirrel Jul 05 '18 at 21:06
  • okay. Got your requirement now. One more thing, if you have paired samples. howcome total number of samples odd. Does any one of the sample not paired? – Mankind_008 Jul 05 '18 at 21:08
  • sorry again.. I corrected it; it's indeed 350 samples but 351 columns because of the 1st one used for feature names etc) – snowy_squirrel Jul 05 '18 at 21:15
  • One approach: (1) record all pairings (by processing the first row) (2) generate permutations of the header ("drug") one by one (3) compare each with pairings and reject if it messes it up. (*) If most are paired and there are cross-pairings this may be too slow, in which case you can devise a permutation generator to use known pairings. – zdim Jul 05 '18 at 21:28

2 Answers2

2

Identify the column indexes of the pairs, find the drug associated with the pairs, shuffle the drugs, then assign the shuffled drugs back to the pairs.

use List::Util qw( shuffle );

my @matrix = (
   [ 'Drug',    'T',   'T',  'T',   'C',  'T',   ..... ],
   [ 'Sample',  's1',  's2', 's3',  's4', 's5',  ..... ],
   [ 'Pair',    '16',  '81', '-16', '32', '-81', ..... ],
   [ 'Cond',    'B',   'D',  'B',   'B',  'D',   ..... ],
   [ 'F1',      '34',  '23', '12',  '',   '9',   ..... ],
   [ 'F2',      '78',  '',   '11',  '87', '10',  ..... ],
);

my %pair_col_idxs_by_key;
{
   my $drug_row = $matrix[0];
   for my $col_idx (1..$#$drug_row) {
   my $row = $matrix[$col_idx];
   push @{ $pair_col_idxs_by_key{join(":", abs($row->[2]), $row->[3])} }, $pair_col_idxs;
}

my @all_pair_col_idxs = values(%pair_col_idxs_by_key);
my @drugs = map { $matrix[ 0 ][ $_->[0] ] } @all_pair_col_idxs;

@drugs = shuffle @drugs;

# Keep reshuffling until you get a previously unseen result.

for my $i (0..$#all_pair_col_idxs) {
   my $pair_col_idxs = $all_pair_col_idxs[$i];
   my $drug          = $drug[$i];

   $matrix[0][$_] = $drug for @$pair_col_idxs;
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
2

Given the data df. Group by absolute value of Pair and then sample/ permute Drug for the grouped pairs. Finally join back on absolute value of Pairs. Using dplyr:

t_df <- as.data.frame(t(df))                    # transposed to use features as cols
t_df$Pair <- as.numeric(as.character(t_df$Pair)

library(dplyr)

# Wrap this into a function to call/ permute 1000 times
df_out <- t_df %>% mutate(abs_pair = abs(Pair)) %>% 
              group_by(abs_pair) %>% filter(row_number()==1) %>% 
          ungroup() %>% mutate(Permuted_drug = sample(Drug, n())) %>%      
              select(abs_pair, Permuted_drug) %>%
          inner_join(t_df %>% mutate(abs_pair = abs(Pair)))

df_out
#  abs_pair Permuted_drug Drug  Sample  Pair Cond 
#     <dbl> <fct>         <fct> <fct>  <dbl> <fct>
#1       16 T             T     s1        16 B    
#2       16 T             T     s3       -16 B    
#3       81 C             T     s2        81 D    
#4       81 C             T     s5       -81 D    
#5       32 T             C     s4        32 B    

Data Used:

df <- read.table(text = "Drug    T   T   T   C   T
Sample  s1  s2  s3  s4  s5
Pair    16  81 -16  32 -81
Cond    B   D    B   B  D", row.names = 1)
Mankind_008
  • 2,158
  • 2
  • 9
  • 15