0

I'm working with a pandas Multiindex that is given by the three keys:
[Verbundzuordnung, ProjektIndex, Datum],

I would like to resample the dataframe on Datum hourly, which drops the right colum TagDesAbdichtens, I would like to keep it as it's static.

            
Verbundzuordnung    ProjektIndex    Datum                           TagDesAbdichtens
1                   81679           2021-11-10 00:00:00+00:00       2021-12-08
                                    2021-11-10 00:00:00+00:00       2021-12-08
                                    2021-11-10 00:00:00+00:00       2021-12-08
                                    2021-11-10 00:00:00+00:00       2021-12-08
                                    2021-11-10 00:00:00+00:00       2021-12-08
...     ...     ...     ...
2                   94574           2022-02-28 23:00:00+00:00       2022-01-31
                                    2022-02-28 23:00:00+00:00       2022-01-31
                                    2022-02-28 23:00:00+00:00       2022-01-31
                                    2022-02-28 23:00:00+00:00       2022-01-31
                                    2022-02-28 23:00:00+00:00       2022-01-31

285192 rows × 1 columns

There are aditional columns that I left out here for easier comprehension.

I am currently applying this to resample the dataframe

all_merged = all_merged.groupby([
    pd.Grouper(level='Verbundzuordnung'), 
    pd.Grouper(level='ProjektIndex'), 
    pd.Grouper(level='Datum', freq='H')]
  )

all_merged.mean() gives me the wanted output with TagDesAbdichtens missing. This value ist for each Verbundzuordnung and ProjektIndex unique and static and I would like to have it back in the resampled version.

Is there a way to do it with native pandas functions?

Krotonix
  • 25
  • 5

2 Answers2

0

I've had success resampling using the native resample function. For example,

    resample_dict = {                                                                                                             
            'Verbundzuordnung': 'mean',                                                                                                    
            'ProjektIndex': 'mean',
            'TagDesAbdichtens': 'first'
    }

    data = data.resample("60T", closed='left', label='left').apply(resample_dict)

You can apply whichever grouping keys (in place of mean) to your columns (e.g. first, min, max, etc).

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html for more.

elliot
  • 111
  • 2
  • 6
  • Thank you so much, this really solved my problem, I was also trying to use merge and join, but that made it much more complicated. I actually assigned mean to each column and just for the `TagDesAbdichents` feature I set first – Krotonix Dec 04 '22 at 13:51
0

Instead of mean() you can do the following

agg({'TagDesAbdichtens': 'first', 'another_col': 'mean', 'another_col2': 'mean', ... })

That is, you can specify a different aggregate function for each column.

radof
  • 601
  • 2
  • 10
  • Thats exactly what I am using now, but kind of sad that there is no option to just ignore a column, as this is not really intuitive if you are new to pandas. – Krotonix Dec 06 '22 at 23:52