-1

This is how I calculate the mean by the group:

I need to calculate the group mean value of the 'volatility'. The group is 'code_YM'. But the results of mean volatility,'stock_volatility',

enter image description here are all nan. Could u help me to fix this?

The type of 'volatility' is float. The type of 'code_YM' is string. I dont know why the results are all nan. I hope the result of mean value could be float.

Moritz Ringler
  • 9,772
  • 9
  • 21
  • 34
  • 1
    Show code and other textual information as properly formatted text in the question, not as comment, image or external link. For the resulting table an image should be ok. – Michael Butscher Mar 14 '23 at 16:31
  • Does this answer your question? [How do I create a new column from the output of pandas groupby().sum()?](https://stackoverflow.com/questions/30244952/how-do-i-create-a-new-column-from-the-output-of-pandas-groupby-sum) – Minh-Long Luu Mar 14 '23 at 16:37
  • 1
    use `transform` -- `df['some_col'] = df.groupby('col')['other_col'].transform('mean')` – It_is_Chris Mar 14 '23 at 16:37
  • You original `.mean()` produces a Series with the mean values for each Group. You attempt to form a new DF column from the (shorter) Series but clearly they can't be directly assigned as different lengths. `.transform` is provided to map the grouped results across to the original DF. – user19077881 Mar 14 '23 at 17:00

1 Answers1

0

As others suggested, this probably happens because the result of the groupby(...).mean() is a shorter vector of numbers and you are trying to create a full column while inserting a shorter vector than it is needed.

The solution to this is using "transform":

data_volatility['stock_v'] = data_volatility.groupby('code_YM')['volatility'].transform('mean')

Other possibilities are:

  1. There is a nan value in the "volatility" column, at least one for each group. Make sure there are no nan values in that column:

    print(pd.isna(data_volatility['volatility']).mean())
    

Will result in the % of Nan/null/None values in that column. make sure it's 0.

  1. the volatility column is not a numeric column, perhaps you read a text file and the reading function didn't find it numeric (maybe a string)
Matan Bendak
  • 128
  • 6