1

Possible Duplicate:
Big Satellite Image Processing

I'm trying to run Mort Canty's Python iMAD implementation on bitemporal RapidEye Multispectral images. Which basically calculates the canonical correlation for the two images and then substracts them. The problem I'm having is that the images are of 5000 x 5000 x 5 (bands) pixels. When I run this on the whole image my computer crashes terribly and I have to turn it off.

Does anyone have an idea what can make python crash the computer like this? If I for example choose 2999x2999 pixels per band everything runs fine.

8 gbs ram, I7-2617M 1.5 1.5 ghz, Windows7 64 bits. Im using the 64 bit version of everything: python(2.7), numpy, scipy and gdal.

thank you in advance!

    def covw(dm,w):
    # weighted covariance matrix and means 
    # from (transposed) data array    
       N = size(dm,0) 
       n = size(w)
       sumw = sum(w)
       ws = tile(w,(N,1))
       means = mat(sum(ws*dm,1)/sumw).T
       means = tile(means,(1,n))
       dmc = dm - means
       dmc = multiply(dmc,sqrt(ws))
       covmat = dmc*dmc.T/sumw
       return (covmat,means)

    def main():
    # ------------test---------------------------------------------------------------    
    if len(sys.argv) == 1:        
    (sys.argv).extend(['-p','[0,1,2,3,4]','-  
    d','[0,4999,0,4999]',
'c://users//pythonxy//workspace//1uno.tif','c://users//pythonxy//workspace//2dos.tif'])
    # -------------------------------------------------------------------------------        

options, args = getopt.getopt(sys.argv[1:],'hp:d:')
pos = None
dims = None            
for option, value in options:
    if option == '-h':
        print 'Usage: python %s [-p "bandPositions" -d "spatialDimensions"] 
        filename1   filename2' %sys.argv[0]
        print '       bandPositions and spatialDimensions are quoted lists, 
        e.g., -p "[0,1,3]" -d "[0,400,0,400]"  \n'
        sys.exit(1) 
    elif option == '-p':
        pos = eval(value)
    elif option == '-d':
        dims = eval(value) 
if len(args) != 2:
    print 'Incorrect number of arguments'
    print 'Usage: python %s [-p "bandspositions" -d "spatialdimensions"] 
    filename1 filename2 \n' %sys.argv[0]
    sys.exit(1)                                    
gdal.AllRegister()
fn1 = args[0]
fn2 = args[1]
path = os.path.dirname(fn1)
basename1 = os.path.basename(fn1)
root1, ext = os.path.splitext(basename1)
basename2 = os.path.basename(fn2)
outfn = path+'\\MAD['+basename1+'-'+basename2+']'+ext
inDataset1 = gdal.Open(fn1,GA_ReadOnly)     
inDataset2 = gdal.Open(fn2,GA_ReadOnly)
cols = inDataset1.RasterXSize
rows = inDataset1.RasterYSize    
bands = inDataset1.RasterCount
cols2 = inDataset2.RasterXSize
rows2 = inDataset2.RasterYSize    
bands2 = inDataset2.RasterCount
if (rows != rows2) or (cols != cols2) or (bands != bands2):
    sys.stderr.write("Size mismatch")
    sys.exit(1)
if pos is None:
    pos = range(bands)
else:
    bands = len(pos) 
if dims is None:
    x0 = 0
    y0 = 0
else:
    x0 = dims[0]
    y0 = dims[2]  
    cols = dims[1]-dims[0] + 1  
    rows = dims[3]-dims[2] + 1                       
# initial weights
wt = ones(cols*rows)      
# data array (transposed so observations are columns)
dm = zeros((2*bands,cols*rows),dtype='float32')
k = 0
for b in pos:
    band1 = inDataset1.GetRasterBand(b+1)
    band1 = band1.ReadAsArray(x0,y0,cols,rows).astype(float)
    dm[k,:] = ravel(band1)
    band2 = inDataset2.GetRasterBand(b+1)
    band2 = band2.ReadAsArray(x0,y0,cols,rows).astype(float)        
    dm[bands+k,:] = ravel(band2)
    k += 1
print '========================='
print '       iMAD'
print '========================='
print 'time1: '+fn1
print 'time2: '+fn2   
print 'Delta    [canonical correlations]'   
# iteration of MAD        
delta = 1.0
oldrho = zeros(bands)
iter = 0
while (delta > 0.001) and (iter < 50):    
#     weighted covariance matrices and means 
    sigma,means = covw(dm,wt)          
    s11 = mat(sigma[0:bands,0:bands])
    s22 = mat(sigma[bands:,bands:]) 
    s12 = mat(sigma[0:bands,bands:])
    s21 = mat(sigma[bands:,0:bands])
#     solution of generalized eigenproblems
    s22i = mat(linalg.inv(s22))
    lama,a = linalg.eig(s12*s22i*s21,s11) 
    s11i = mat(linalg.inv(s11))    
    lamb,b = linalg.eig(s21*s11i*s12,s22) 
#     sort a   
    idx = argsort(lama)
    a = a[:,idx]
#     sort b         
    idx = argsort(lamb)
    b = b[:,idx]           
#     canonical correlations        
    rho = sqrt(real(lamb[idx]))             
#     normalize dispersions   
    a = mat(a)
    tmp1 = a.T*s11*a
    tmp2 = 1./sqrt(diag(tmp1))
    tmp3 = tile(tmp2,(bands,1))
    a = multiply(a,tmp3)
    b = mat(b) 
    tmp1 = b.T*s22*b
    tmp2 = 1./sqrt(diag(tmp1))
    tmp3 = tile(tmp2,(bands,1))
    b = multiply(b,tmp3)
#     assure positive correlation
    tmp = diag(a.T*s12*b)
    b = b*diag(tmp/abs(tmp))
#     canonical and MAD variates
    U = a.T*mat(dm[0:bands,:]-means[0:bands,:])    
    V = b.T*mat(dm[bands:,:]-means[bands:,:])           
    MAD = U-V  
#     new weights        
    var_mad = tile(mat(2*(1-rho)).T,(1,rows*cols))    
    chisqr = sum(multiply(MAD,MAD)/var_mad,0)
    wt = 1-stats.chi2.cdf(chisqr,[bands])
#     continue iteration         
    delta = sum(abs(rho-oldrho))
    oldrho = rho
    print delta
    iter += 1   
# write results to disk
driver = inDataset1.GetDriver()    
outDataset = driver.Create(outfn,cols,rows,bands+1,GDT_Float32)
projection = inDataset1.GetProjection()
geotransform = inDataset1.GetGeoTransform()
if geotransform is not None:
    gt = list(geotransform)
    gt[0] = gt[0] + x0*gt[1]
    gt[3] = gt[3] + y0*gt[5]
    outDataset.SetGeoTransform(tuple(gt))
if projection is not None:
    outDataset.SetProjection(projection)        
for k in range(bands):        
    outBand = outDataset.GetRasterBand(k+1)
    outBand.WriteArray(resize(MAD[k,:],(rows,cols)),0,0) 
    outBand.FlushCache()
outBand = outDataset.GetRasterBand(bands+1)    
outBand.WriteArray(resize(chisqr,(rows,cols)),0,0) 
outBand.FlushCache()    
outDataset = None
inDataset1 = None
inDataset2 = None  
print 'result written to: '+outfn
print '---------------------------------'     

if name == 'main': main()

Community
  • 1
  • 1
JEquihua
  • 1,217
  • 3
  • 20
  • 40
  • 2
    What are the symptoms of the crash? Are you sure it's not just thrashing swap? – the paul May 18 '12 at 19:02
  • I'm sorry but I don't know what thrashing swap is. Im kind of new at this. Everything just freezes. Cant do anything anymore but turn off the computer. – JEquihua May 18 '12 at 19:46
  • Can you be a bit more specific? Does the mouse pointer move? When you hit the numlock key on the keyboard, does the indicator light on your keyboard toggle? Is the hard disk activity light blinking? Can you ALT+TAB? – Burhan Khalid May 18 '12 at 19:57
  • Pointer dissapears. Can't alt+tab. Hard disk activity stops. Really nasty stuff. But works perfectly with less pixels. – JEquihua May 18 '12 at 20:17

1 Answers1

1

It sounds like this operation simply takes more memory than your computer can provide. This is an oversimplification, but when the system runs out of actual RAM to use, it sometimes writes sections of memory which appear to be being used less onto your hard disk, so it can use that real memory for something else. The hard disk is many orders of magnitude slower than main memory, so when your software needs parts of memory which have been written to the disk, everything can get real slow. When this happens at a dramatic scale, and parts of your software and operating system are constantly trying to use pieces of memory which have been swapped out (written to disk), your hard drive can get a major workout trying to seek back and forth, writing lots of stuff and reading lots of stuff and writing more stuff, etc. The system can become quite unresponsive in a situation like this.

You could see if this is really what's happening by watching your system's activity monitors (I forget what they're called on Windows, but I know they're there; some software that shows you how much memory is allocated, being used, etc, and draws a nice graph for you). While watching those, start up your program and watch what the memory allocation rate looks like.

There are probably some ways to alleviate memory usage in this code, if fewer things are kept in memory at a time, but I don't see offhand what they are. You could also add more RAM to your system in hopes that it would fix this.

the paul
  • 8,972
  • 1
  • 36
  • 53
  • Makes a lot of sense. Is there a sound way to track which part of the code makes it finally die? I mean, no just trial and error, as my computer dies so hard it would take forever. Is there any other idea on how to handle this amount of data besides just throwing more ram at the problem? greetings and thank you... – JEquihua May 18 '12 at 20:16
  • If you really are seeing swap thrash, there isn't any one part of the code that's causing it. Anything that needs significant amounts of memory is partly responsible. You could try making the code output periodic status messages to get a feel for how far it gets, or run the code under a Python profiler that could help you identify where most of the memory is. – the paul May 18 '12 at 20:28
  • OK, found it! everything goes to hell when this operation is performed: covmat = dmc*dmc.T/sumw where dmc is a 10x25000000 matrix and sumw is a scalar. Any ideas? – JEquihua May 19 '12 at 01:24
  • any idea on how to look at the sourcecode for * ? thank you for all. – JEquihua May 19 '12 at 01:48
  • Nothing comes to mind immediately; that looks like a normal matrix multiplication, which shouldn't have any great need for significant extra memory beyond the space needed for the result. But then, I'm not positive what the .T attribute is/does. Transpose? The * code can be identified based on the types of the arguments. You might try breaking up the operations there to be on separate lines: dmct=dmc.T; comvat=dmc*dmct; comvat=comvat/sumw; something like that. – the paul May 19 '12 at 16:46