1

I have a workflow using the xarray package where I am grouping an xarray object using groupby() and then applying a function that turns it to a dataframe and performs some calculations using the map() function.

I would like to rename a column in the output dataframe using the name of the group similar to how you would in this example : pandas apply example

Though it appears the xarray groups do not have a 'name' attribute that I can use plug-and-play that examples code. I also tried using the group.labels but that did not work either.

General workflow for xr.groupby.map()

# Dummy Dataset
ds = xr.Dataset(
    {"foo": (("x", "y"), np.random.rand(4, 3))},
     coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}
)

# Dataset as array
arr = ds["foo"]

# User defined function
def standardize(x):
    return (x - x.mean()) / x.std()

# Apply the function to each group
arr.groupby('letters').map(standardize)

Functionality I'm shooting for (does not work as written)

def to_df_and_rename(grouped_array):
    df_out = grouped_array.to_dataframe()
    df_out = df_out[['desired_col']]
    df_out = df_out.rename(columns = {"desired_col" : grouped_array.name})
    return(df_out)

Thanks.

Adam Kemberling
  • 301
  • 1
  • 11
  • What’s the output you’re expecting? Does iterating over `(label, group)` values with `da.groupby(‘x’).groups` work, as per http://xarray.pydata.org/en/stable/groupby.html – Maximilian Apr 09 '20 at 04:24
  • So the original goal was to return a single pandas dataframe with a column or index for time, and columns for the mean value for each group, each named with that group name rather than the variable. The original approach used a loop and iterated over (label, group) values. But that took a very long time, so I was looking into ways to vectorize it using a user-defined function and array.map. – Adam Kemberling Apr 09 '20 at 13:09
  • OK. `map` doesn't vectorize: it's sugar for a normal loop. Vectorizing in python is hard – Maximilian Apr 10 '20 at 16:45
  • Okay that is good to know. I know R much better and find myself trying to force those workflows into my python work. I was able to find a solution that worked and will post it as a solution for others when I am able to process it down to a simpler version. Thanks! – Adam Kemberling Apr 10 '20 at 17:37
  • Yeah, I hear you on the transition. Best of luck! – Maximilian Apr 10 '20 at 18:33

0 Answers0