2

I am trying to de-trend my dataset of size 480x2040 = close to 1,000,000 pixels. I have 17 timesteps in this series (years) however I want to move to daily timesteps at some point. This code works, however runs way way way too slow to be functional.

I feel that scipy.signal.detrend can do the whole dataset, however I have some NaNs. In some cases, NaN are continents, which means there is a NaN at every timestep, however in some rarer cases, there is some missing data.

How can I detrend every pixel of my map over time, while ignoring/skipping the NaNs? It needs to be orders of magnitude quicker than this looping.

  for i in range(0,nlat):
    for j in range(0,nlon):
        pixel = ds[:,i,j]
        b = ~np.isnan(pixel)
        detrend[b,i,j]=signal.detrend(pixel[b])

Cheers!

Ocean Scientist
  • 411
  • 4
  • 16

1 Answers1

1

I'm not sure if it's possible to make signal.detrend skip NaNs on array basis. You can, however, use some method to fill your missing data and then detrend. If you have few missing data points, the effect of that should be negligible.

Since continents are always NaN, you can replace them with some special marker value, since you won't care about detrending them anyway. For the missing data NaNs, I suggest using pandas, which is great for timeseries, for various methods of filling missing data. I'll use the simplest here -- forward fill from previously known values -- but you can also use more sophisticated methods like taking the mean of adjacent values or interpolating.

import scipy.signal as signal
import numpy as np
import pandas as pd

''' Construct fake data'''
# construct some map-like values
pix = np.random.randint(256, size=(10,20))
ds = np.zeros((10, pix.shape[0], pix.shape[1]))
for i in range(1,11):
    ds[i-1,:,:] = i*pix

# 5% missing data
ds[(np.random.rand(10,10,20) < 0.05)] = np.nan
# a square continent
ds[:,1:3,5:10] = np.nan

''' Solution based on fake data'''
# assign some marker value to continents
ds[:, np.all(np.isnan(ds), axis=0)] = -1

df = pd.Panel(ds)
df.fillna(method='ffill', axis=0, inplace=True) # forward fill
df.fillna(method='bfill', axis=0, inplace=True) # backfill in case there are nans in first timestep

detrended = signal.detrend(df.values)
Matt
  • 282
  • 3
  • 13
  • Thanks for the reply. I'm not 100% convinced about the -1 marker for the NaN values but I will give it a go and see what happens! I'll probably need to make a continent mask so I can map them back onto my plot. – Ocean Scientist Jul 12 '18 at 23:05
  • @NicPittman the -1 was an example, you can use any other value for the continent NaNs that suits your data -- i.e. a value that you know you won't encounter otherwise. The missing data NaNs will of course not be -1 but set according to the fill method you choose with pandas. – Matt Jul 13 '18 at 11:12