0

Let the size of my matrix A is sx x sy x sz. I want to get a matrix B (with the same size as A) where the element of B at (x,y,z) represents the average of the n x n x n submatrix extracted from the same position at A.

How can I do it?

I know I can do this using convn or using 3 for loops, but it will be very slow.

The matrix A of size 200 x 200 x 150 is in double precision on my machine when I use n = 9 takes around 20 to 30 seconds.

rayryeng
  • 102,964
  • 22
  • 184
  • 193
user570593
  • 3,420
  • 12
  • 56
  • 91
  • `convn` is likely the fastest way to do this so if that's too slow for you, then we probably don't have much to suggest. – Suever Jun 21 '16 at 16:00
  • 1
    Try to exploit GPU if available to you. GPUs excel at doing convolutions. – Autonomous Jun 21 '16 at 16:02
  • Can you provide the specs of your machine? For me, it takes on average about 1 second... which I consider quite fast for that relatively large amount of data and the kernel you specified. It may simply be your machine that is limiting performance. – rayryeng Jun 21 '16 at 16:38

1 Answers1

3

Consider using imfilter from the Image Processing Toolbox. imfilter is essentially an optimized convolution and correlation pipeline that takes advantage of the Intel Integrated Performance Primitives. Just as a simple test, let's create a random double precision 200 x 200 x 150 3D matrix and we want to find the average of 9 x 9 x 9 pixel neighbourhoods as you said:

A = rand(200,200,150);
kernel = (1/9^3)*ones(9,9,9);
B = imfilter(A, kernel);

This actually runs quite fast on my machine. My specs are a MacBook Pro with 16GB of RAM running a 2.3 GHz Intel Core i7 processor.

Just to satisfy curiosity, I used timeit to time how long this operation takes once you allocate the kernel and the matrix:

A = rand(200,200,150);
kernel = (1/9^3)*ones(9,9,9);
B = imfilter(A, kernel);
t = timeit(@() imfilter(A, kernel));

On average, the above code runs for about 1.1393 seconds:

>> t

t =

    1.1393

Unless you migrate over to GPUs, this to me is the fastest way you're ever going to get it... especially because you have 150 slices of 200 x 200 2D data you have to process with each point collecting a 9 x 9 x 9 volume of elements.

rayryeng
  • 102,964
  • 22
  • 184
  • 193