0

I'm working with a dataframe containing environnemental values (sentinel2 satellite : NDVI) like:

        Date    ID_151894   ID_109386   ID_111656   ID_110006   ID_112281   ID_132408
0   2015-07-06  0.82    0.61    0.85    0.86    0.76    nan
1   2015-07-16  0.83    0.81    0.77    0.83    0.84    0.82
2   2015-08-02  0.88    0.89    0.89    0.89    0.86    0.84
3   2015-08-05  nan     nan     0.85    nan     0.83    0.77
4   2015-08-12  0.82    0.77    nan     0.65    nan     0.42
5   2015-08-22  0.85    0.85    0.88    0.87    0.83    0.83

The columns correspond to different places and the nan values are due to cloudy conditions (which happen often in Belgium). There are obviously lot more values. To remove outliers, I use the method described in the timesat manual (Jönsson & Eklundh, 2015) :

  1. it deviates more than a maximum deviation (here called cutoff) from the median
  2. value is lower than the mean value of its immediate neighbors minus the cutoff or it is larger than the highest value of its immediate neighbor plus the cutoff

So, I have made the code below to do so :

NDVI = pd.read_excel("C:/Python_files/Cartofor/NDVI_frene_5ha.xlsx")
date = NDVI["Date"]
MED = NDVI.median(axis = 0, skipna = True, numeric_only=True)
SD = NDVI.std(axis = 0, skipna = True, numeric_only=True)
cutoff = 1.5 * SD

for j in range(1,21):  #columns
    for i in range(1,480): #rows
        if (NDVIF.iloc[i,j] < (((NDVIF.iloc[i-1,j] + NDVIF.iloc[i+1,j])/2) - cutoff.iloc[j])):
            NDVIF.iloc[i,j] == float('NaN')
        elif (NDVIF.iloc[i,j] > (max(NDVIF.iloc[i-1,j], NDVIF.iloc[i+1,j]) + cutoff.iloc[j])): #2)
            NDVIF.iloc[i,j] == float('NaN')
        elif ((NDVIF.iloc[i,j] >= abs(MED.iloc[j] - cutoff.iloc[j]))) & (NDVIF.iloc[i,j] <= abs(MED.iloc[j] + cutoff.iloc[j])): #1)
            NDVIF.iloc[i,j] == NDVIF.iloc[i,j]
        else:
            NDVIF.iloc[i,j] == float('NaN')

The problem is that I need to omit the 'NaN' values for the calculations. The goal is to have a dataframe like the one above without the outliers.

Once this is made, I have to interpolate the values for a new chosen time index (e.g. one value per day or one value every five days from 2016 to 2020) and write each interpolated column on a txt file to enter it on the TimeSat software.

I hope my english is not too bad and thank you for your answers! :)

0 Answers0