-1

I have this df:

            Timestamp        List     Power    Energy     Status
0 2020-01-01 01:05:50   [5, 5, 5]      7000     15000     online
1 2020-01-01 01:06:20   [6, 6, 6]      7500     16000     online
2 2020-01-01 01:08:30   [0, 0, 0]         5         0    offline
...

no i want to resample it. Use .resample as following:

df2 = df.set_index('timestamp').resample('min').?

i want the df in 1min - intervalls. To each intervall i want to match with the rows as follows: List: if status = online: last entry of the intervall else '0'; Power: if status = online: the mean value of the intervall else '0'; Energy: if status = online: last entry of the intervall else '0; Status: the last status of the intervall;

how do i fill the NaN values, which .resample outputs, if there is no data in df? E.g. no data for an interval, then the df should be filled as follows Power = 0; Energy = 0; status = offline;...

I tried something like that:

df2 = df.set_index('Timestamp').resample('T').agg({'List':'last',
                                                   'Power':'mean',
                                                   'Energy':'last',
                                                   'Status':'last'})

and got:

         Timestamp        List                      Power    Energy     Status
0 2020-01-01 01:05   [5, 5, 5]  (average of the interval)     15000     online
1 2020-01-01 01:06   [6, 6, 6]  (average of the interval)     16000     online
2 2020-01-01 01:07         NaN                        NaN       NaN        NaN
3 2020-01-01 01:08   [0, 0, 0]                          5         0    offline

Expected outcome:

         Timestamp        List                      Power    Energy     Status
0 2020-01-01 01:05   [5, 5, 5]  (average of the interval)     15000     online
1 2020-01-01 01:06   [6, 6, 6]  (average of the interval)     16000     online
2 2020-01-01 01:07   [0, 0, 0]                          0         0    offline
3 2020-01-01 01:08   [0, 0, 0]                          5         0    offline
Vini
  • 125
  • 1
  • 8

1 Answers1

1

There is no way to pass fillna rule to separately handle each column NA values during .resample().agg() as viewed in docs https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html

In your case even interpolation does not work, so, try to manually handle each column NA values

Firstly, let's initialize your sample frame.

import pandas as pd

data = {"Timestamp":{"0": "2020-01-01 01:05:50",
                     "1": "2020-01-01 01:06:20",
                     "2": "2020-01-01 01:08:30"},
        "List": {"0": [5, 5, 5],
                 "1": [6, 6, 6],
                 "2": [0, 0, 0]},
        "Power": {"0": 7000,
                 "1": 7500,
                 "2": 5},
        "Energy": {"0": 15000,
                   "1": 16000,
                   "2": 0},
        "Status": {"0": "online",
                   "1": "online",
                   "2": "offline"},
       }

df = pd.DataFrame(data)

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

df = df.set_index('Timestamp').resample('T').agg({'List':'last',
                                                   'Power':'mean',
                                                   'Energy':'last',
                                                   'Status':'last'})

Now we can manually replace NA in each column separately

df["List"] = df["List"].fillna("[0, 0, 0]")
df["Status"] = df["Status"].fillna('offline')
df = df.fillna(0)

or more convenient dict way to do it

values = {
          'List': '[0, 0, 0]',
          'Status': 'offline', 
          'Power': 0, 
          'Energy': 0
}

df = df.fillna(value=values)
Timestamp   List    Power   Energy  Status
0   2020-01-01 01:05:00     [5, 5, 5]   7000.0  15000.0     online
1   2020-01-01 01:06:00     [6, 6, 6]   7500.0  16000.0     online
2   2020-01-01 01:07:00     [0, 0, 0]   0.0     0.0     offline
3   2020-01-01 01:08:00     [0, 0, 0]   5.0     0.0     offline
Ivan Shelonik
  • 1,958
  • 5
  • 25
  • 49
  • Thanks for your advise, but my df has about 1Mio. lines - so replace the NaN-values manually would not work. – Vini Aug 10 '20 at 06:40
  • @Vini 1 Million rows are not huge **.fillna()** has to work fine even with 100 columns. Pay attention that .fiilna() is cython optimized function and not python iterator :) – Ivan Shelonik Aug 10 '20 at 07:30
  • Now i have a new colum 'ID' and have to fullfill the Nan-values under conditions: the condition for filling the nan values is more complicated than expected: I have different cycles which are marked by different IDs. Within a cycle (ID appears both before and after the nan value) the power of the two "surrounding" lines should be averaged and in the column energy the last existing value of the column energy should be entered. Outside the cycle (ID before != next ID) the power as well as the energy should be set to 0. – Vini Aug 10 '20 at 07:58
  • @VIni If you have complicated logic for filling NaN and want it to be fast use cython as written here. So you write apply lambda function and wrap it in cython https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html – Ivan Shelonik Aug 10 '20 at 10:33
  • Thanks for the tipp! i will try it. – Vini Aug 10 '20 at 12:55