I am trying to do a 2D-surface fit on some imaging data. I attached an example of such data, which is basically a 1014 x 1014 array with substantial amount of noise. Example_image. Some patches of this array are invalid data, which I masked and set to NaN values, as shown in yellow in the Example image. As you can see in the image, there is a background gradient from left (brighter) to right (dimmer), which I am trying to remove. The gradient cannot be well fitted by a polynomial, hence my goal is to do a 2D-surface bivariate spline fit, and subtract the gradient off.
I have tried a number of tasks in scipy, but most of them do not return ideal result.
To start with I have tried the [RectBivariateSpline] Bivariate structured interpolation of large array with NaN values or mask), but since my image have NaNs in it, running RectBivariateSpline gives only an output of NaNs.
I have also tried SmoothBivariateSpline, which is the irregular gridded version of the task. I omitted those pixels that have NaN values and converted the rest into 1D arrays as input. But it failed as the array size is too big. I then tried to chop my array to try to run it on smaller chunks, but it gives the following error and quit with a segmentation fault, which I have no idea what it means.
fitpack2.py:1044: UserWarning: Error on entry, no approximation returned. The following conditions must hold: xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1 If iopt==-1, then xb
I then tried to first filled in the NaN patches in my image with values using griddata with linear interpolation. Since the patches are huge, the interpolation is not ideal, but at least it gave me an array without NaN. I then use this array to run RectBivariateSpline again. But the output array is still NaNs.
I suspect that the noise in my image is screwing up the behaviour of both tasks, so I also tried to first run a Gaussian kernel on my image to smooth it, then filled in the NaN patches with griddata, then run RectBivariateSpline or SmoothBivariateSpline, but they still give me arrays with NaN values as output.
I am not sure that I understand the manual of both tasks correctly, so I attach the following script:
#!/usr/bin/python
import matplotlib
matplotlib.use('qt5agg')
#matplotlib.rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
#matplotlib.rc('text.latex', preamble=r'\usepackage{cmbright}')
#matplotlib.rc('text.latex', preamble=r'\usepackage[scaled]{helvet} \renewcommand\familydefault{\sfdefault} \usepackage[T1]{fontenc}')
#matplotlib.rc('text', usetex=True)
import matplotlib.pyplot as plt
import numpy as np
import astropy.io.fits as pyfits
import scipy.interpolate as sp
from astropy.convolution import convolve
from astropy.convolution import Gaussian2DKernel
#------------------------------------------------------------
#Read in the arrays
hdulistorg = pyfits.open('icmj01jrq_flt.fits')
hdulistorg.info()
errarrorg = np.swapaxes(hdulistorg[1].data, 0,1)
hdulist = pyfits.open('jrq_sci_nan_deep.fits')
hdulist.info()
dataarrorg = np.swapaxes(hdulist[0].data, 0,1) #image array
errarrorg = np.swapaxes(hdulistorg[1].data, 0,1) #error array
#Flag some of the problematic values, turn NaNs into 0 for easier handling
dataarr = np.copy(dataarrorg)
w=np.isnan(dataarr)
ww=np.where(dataarr == 0)
www=np.where(dataarr > 100)
wwww=np.where(dataarr < 0)
errarr = 1.0 / (np.copy(errarrorg)+1e-5) # Try to use 1/error as the estimate for weight below
errarr[w] = 0
errarr[ww] = 0
errarr[www] = 0
errarr[wwww]=0
dataarr[w]= 0
dataarr[ww]= 0
dataarr[www]=0
dataarr[wwww]=0
#Make a gaussian kernel smoothed data
maskarr = np.copy(errarr) #For masking the nan regions so they dun get smoothed
maskarr[:]=0
maskarr[w]=1
maskarr[ww]=1
maskarr[www]=1
maskarr[wwww]=1
gauss = Gaussian2DKernel(stddev=5)
condataarr = convolve(dataarr,gauss,normalize_kernel=True,boundary='extend',mask=maskarr)
condataarr[w]=0
conerrarr = np.copy(errarr)
#Setting x,y arrays for the Spline functions
nx, ny = (1014,1014)
x = np.linspace(0, 1013, nx)
y = np.linspace(0, 1013, ny)
xv, yv = np.meshgrid(x, y)
#Make an 1D version of these 2D arrays
dataarrflat = np.ravel(condataarr[0:200,0:200]) #Try only a small chunk!
xvflat = np.ravel(xv[0:200,0:200])
yvflat = np.ravel(yv[0:200,0:200])
errarrflat = np.ravel(conerrarr[0:200,0:200])
notnanloc = np.where(dataarrflat != 0) #Not NaNs
#SmoothBivariateSpline!
rect_S_spline = sp.SmoothBivariateSpline(xvflat[notnanloc], yvflat[notnanloc], dataarrflat[notnanloc],w=errarrflat[notnanloc], kx=3, ky=3)
#Also try using grid data to fix the grid?
gddataarr = np.copy(condataarr)
gddataarrflat = np.ravel(gddataarr)
gdloc = np.where(gddataarrflat != 0) #Not NaNs
gdxvflat = np.ravel(xv)
gdyvflat = np.ravel(yv)
xyarr = np.c_[gdxvflat[gdloc],gdyvflat[gdloc]]
x_grid, y_grid = np.mgrid[0:1013:1014j,0:1013:1014j]
grid_z2 = sp.griddata(xyarr, gddataarrflat[gdloc], (x_grid, y_grid), method='linear')
plt.imshow(grid_z2.T)
#plt.show()
#RectBivariatSpline
rect_B_spline = sp.RectBivariateSpline(x, y, grid_z2.T)
#Result grid (same as input for now)
xnew = np.arange(0, 1013, 1)
ynew = np.arange(0, 1013, 1)
znewS = rect_S_spline(xnew, ynew)
znewB = rect_B_spline(xnew, ynew)
print 'znewS', znewS
print 'znewB', znewB
#Write FITS files
condataarr = np.swapaxes(condataarr, 0, 1)
hdu2 = pyfits.PrimaryHDU(condataarr)
hdulist2 = pyfits.HDUList([hdu2])
hdulist2.writeto('contest.fits',overwrite=True)
hdulist2.close()
hdu3 = pyfits.PrimaryHDU(znewS)
hdulist3 = pyfits.HDUList([hdu3])
hdulist3.writeto('Stest.fits',overwrite=True)
hdulist3.close()