I have a 3-D array (below, z
), e.g. representing a succession of 2D arrays (below, a1
and a2
) in time. I want to select some values for all these 2D arrays along their axes (condition on two reference axes (x
and y
below), and then perform some operation (e.g., mean, sum, ...) on the resulting succession of "smaller" 2D arrays.
The code below proposes several ways to do that. I find solution1
very inelegant, but it seems to perform faster than solution2
. Why is it so, and is there a better way (more concise, and efficient (speed and memory)) to do that?
Regarding Step2, which one is the best option, are there any other, more efficient ones, and why is calculation C2
not working? Thanks!
[source of inspiration: Get mean of 2D slice of a 3D array in numpy ]
import numpy
import time
# Control parameters (to be modified to make different tests)
xx=1000
yy=6000
# Some 2D arrays, z is a 3D array containing a succesion of such arrays (2 here)
a1=numpy.arange(xx*yy).reshape((yy, xx))
a2=numpy.linspace(0,100, num=xx*yy).reshape((yy, xx))
z=numpy.array((a1, a2))
# Axes x and y along which conditioning for the 2D arrays is made
x=numpy.arange(xx)
y=numpy.arange(yy)
# Condition is on x and y, to be applied on a1 and a2 simultaneously
xmin, xmax = xx*0.4, xx*0.8
ymin, ymax = yy*0.2, yy*0.5
xcond = numpy.logical_and(x>=xmin, x<=xmax)
ycond = numpy.logical_and(y>=ymin, y<=ymax)
def solution1():
xcond2D = numpy.tile(xcond, (yy, 1))
ycond2D = numpy.tile(ycond[numpy.newaxis].transpose(), (1, xx))
xymask = numpy.logical_not(numpy.logical_and(xcond2D, ycond2D))
xymaskzdim = numpy.tile(xymask, (z.shape[0], 1, 1))
return numpy.ma.MaskedArray(z, xymaskzdim)
def solution2():
return z[:,:,xcond][:,ycond, :]
start=time.clock()
z1=solution1()
end=time.clock()
print "Solution1: %s sec" % (end-start)
start=time.clock()
z2=solution2()
end=time.clock()
print "Solution2: %s sec" % (end-start)
# Step 2
# Now compute some calculation on the resulting z1 or z2
print "A1: ", z2.reshape(z2.shape[0], z2.shape[1]*z2.shape[2]).mean(axis=1)
print "A2: ", z1.reshape(z1.shape[0], z1.shape[1]*z1.shape[2]).mean(axis=1)
print "B1: ", z2.mean(axis=2).mean(axis=1)
print "B2: ", z1.mean(axis=2).mean(axis=1)
print "Numpy version: ", numpy.version.version
print "C1: ", z2.mean(axis=(1, 2))
print "C2: ", z1.mean(axis=(1, 2))
Output:
Solution1: 0.0568935728474 sec
Solution2: 0.157177904729 sec
A1: [ 2.10060000e+06 3.50100058e+01]
A2: [2100600.0 35.01000583500077]
B1: [ 2.10060000e+06 3.50100058e+01]
B2: [2100600.0 35.010005835000975]
Numpy version: 1.7.1
C1: [ 2.10060000e+06 3.50100058e+01]
C2:
TypeError: tuple indices must be integers, not tuple