0

I have created a PDL matrix. I need to do a pairwise comparison between each row. Currently I am using the 'where' and 'cov' command to return the pairwise comparison for two slices (generated in a perl loop).

My question: How can I use 'range' and 'slice' to loop over the rows in a pairwise fashion? How can I return my index position? I have looped over the matrix using perl. I have read that looping with perl really cripples the power of PDL.

Desired output:

indexA indexB Value
pos1   pos5   1
pos1   pos6   5
pos1   pos0   7 

To be clear I only want to use PDL functionality.

Here is some pseudo code that will (hopeful) illustrate my point better.

p $b

[
 [1 0 3 0]
 [0 1 0 1]
 [1 3 1 3]   <- example piddle y
 [0 1 0 1]   <- example piddle z
]

my concept function{


slice $b (grab row z) - works fine
slice $b (grab row y) - works fine


($a, $b) = where($a,$b, $a < 3 && $b < 3 ) - works fine

p $a [1 1] 
p $b [0  0] 

cov($a $b) - works just fine.

}

I just need a way to execute pairwise across all rows. I will need to do factorial(n rows) comparisons.

2 Answers2

1

PDL threading is the concept you are looking for here. The general technique for looping along dimensions is to add dummy dimensions in the appropriate places so that the calculation generates implicit threadloops needed. For a multi-dimensional problem, there can be a number of different ways to add dims and hence to create the threadloops.

For your pairwise row calculation, you can choose two nested loops over slice indexes which has perl loops over the two index counts and will generate PDL threading along the rows. You could use just one perl loop over indexes but take advantage of implicit threadlooping to calculate for all rows at once.

A fully PDL-threadloop computation would be to add a dummy dimension for the loop over rows for each of the arguments so that you would calculate the entire N**2 row calculations at once. Here is an example for a shape [4,3] array with the calculation being the == operator:

pdl> $b = floor(random(4,3)*5)

pdl> p $b

[
 [0 4 3 3]
 [3 3 4 2]
 [4 0 1 4]
]

pdl> p $b(,*3)==$b(,,*3)

[
 [
  [1 1 1 1]
  [0 0 0 0]
  [0 0 0 0]
 ]
 [
  [0 0 0 0]
  [1 1 1 1]
  [0 0 0 0]
 ]
 [
  [0 0 0 0]
  [0 0 0 0]
  [1 1 1 1]
 ]
]

The result is a shape [4,3,3] piddle with the 0th dimension corresponding to the rows resulting from the pairwise calculation and the 1st and 2nd dims correspond to the row indexes involved in the == operation.

If you need an index value from or for one of these threadloop calculations, use the xvals, yvals, zvals, or axisvals to generate a piddle with the index values corresponding to that array axis.

pdl> p $b->xvals

[
 [0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]
]


pdl> p $b->yvals

[
 [0 0 0 0]
 [1 1 1 1]
 [2 2 2 2]
]

There are a lot of details relating to the implementation of the PDL threading (not the same as perl threading or posix threads). I recommend the perldl mailing list for reference and discussion with other PDL users and developers. Also, see the first on-line draft of the PDL Book which has more comprehensive coverage of PDL computation and threading.

chm
  • 147
  • 1
  • 3
0

I think what you're looking for is a method to find all different pairs of rows in the array and then process each pair using cov? If that's correct then I haven't heard of cov and a quick search through the documentation doesn't help. However I can say a few things that may help.

I think you're being overly cautious about dropping out of PDL into Perl code, which will be fine if all you are doing is looping over the indices of all row pairs and pulling those rows out using slice. This is shown in the some sample code below.

Also you can't call where like that as $a < 3 etc. are piddles themselves and the boolean operator won't do what you want on them. Use the & operator instead, and add some parentheses to make sure the expression gets executed in the right order.

Beyond that I can't help unless you correct my understanding of your question or direct me to some documentation of the cov subroutine.

use strict;
use warnings;

use PDL;

my $dat = pdl <<END;
[
 [1 0 3 0]
 [0 1 0 1]
 [1 3 1 3]
 [0 1 0 1]
]
END

my $max2 = $dat->dim(1) - 1;

for my $i (0 .. $max2 - 1) {
  for my $j ($i + 1 .. $max2) {

    my $row1 = $dat->slice(",($i)");
    my $row2 = $dat->slice(",($j)");

    ($row1, $row2) = where($row1, $row2, ($row1 < 3) & ($row2 < 3));

    cov($row1, $row2);
  }
}
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Note that the way you initialize `$dat` could be much, much cleaner. You don't need a heredoc. Just use a multiline string. The lack of newlines in comments makes it difficult, but if you just put the letter `q` in front of the opening square bracket and remove the heredoc cruft, it'd work fine: `$dat = q[ 1 0 3 0; 0 1 0 1; 1 3 1 3; 0 1 0 1];` Notice the `q` just before the square bracket turns the square bracket into a Perl quote, while maintaining the intent. – David Mertens Jun 10 '14 at 22:01
  • @DavidMertens: I know how Perl quotes work. I used a heredoc so that I could copy the data straight from the question – Borodin Jun 10 '14 at 22:14
  • The data can be copied straight from the question using the `q` operator. See http://pdl.perl.org/PDLdocs/Core.html#pdl – David Mertens Jul 08 '14 at 21:45
  • @DavidMertens: Anything can be quoted using the `qq` operator. Why do you think heredocs exist at all? – Borodin Jul 09 '14 at 10:49