4

A pandas dataframe :

>>> df
                  sales  net_pft
STK_ID RPT_Date                 
002138 20140930   3.325    0.607
       20150930   3.619    0.738
       20160930   4.779    0.948
600004 20140930  13.986    2.205
       20150930  14.226    3.080
       20160930  15.499    3.619
600660 20140930  31.773    5.286
       20150930  31.040    6.333
       20160930  40.062    7.186

Just want to know how to get the output as the each row's value is divided by its first row of each group, like this:

                  sales  net_pft
STK_ID RPT_Date                 
002138 20140930   1.000    1.000
       20150930   1.088    1.216
       20160930   1.437    1.562
600004 20140930   1.000    1.000
       20150930   1.017    1.397
       20160930   1.108    1.641
600660 20140930   1.000    1.000
       20150930   0.977    1.198
       20160930   1.261    1.359

Thanks,

vestland
  • 55,229
  • 37
  • 187
  • 305
bigbug
  • 55,954
  • 42
  • 77
  • 96

1 Answers1

6
import pandas as pd

df = pd.DataFrame({'RPT_Date': ['20140930', '20150930', '20160930', '20140930', '20150930', '20160930', '20140930', '20150930', '20160930'], 'STK_ID': ['002138', '002138', '002138', '600004', '600004', '600004', '600660', '600660', '600660'], 'net_pft': [0.607, 0.738, 0.948, 2.205, 3.080, 3.619, 5.286, 6.333, 7.186], 'sales': [3.325, 3.619, 4.779, 13.986, 14.226, 15.499, 31.773, 31.040, 40.062]})
df = df.set_index(['STK_ID','RPT_Date'])

firsts = (df.groupby(level=['STK_ID']).transform('first'))
result = df / firsts

yields

                  net_pft     sales
STK_ID RPT_Date                    
002138 20140930  1.000000  1.000000
       20150930  1.215815  1.088421
       20160930  1.561779  1.437293
600004 20140930  1.000000  1.000000
       20150930  1.396825  1.017160
       20160930  1.641270  1.108180
600660 20140930  1.000000  1.000000
       20150930  1.198070  0.976930
       20160930  1.359440  1.260882

The main trick above is to use groupby/transform('first') to create a DataFrame which is the same shape as df but whose values come from the first row of each group:

firsts = df.groupby(level=['STK_ID']).transform('first')
#                  net_pft   sales
# STK_ID RPT_Date                 
# 002138 20140930    0.607   3.325
#        20150930    0.607   3.325
#        20160930    0.607   3.325
# 600004 20140930    2.205  13.986
#        20150930    2.205  13.986
#        20160930    2.205  13.986
# 600660 20140930    5.286  31.773
#        20150930    5.286  31.773
#        20160930    5.286  31.773

Although this is a profligate use of memory, this is likely the quickest way to obtain the desired result since it avoids looping through the groups in Python.


If the above code raises a TypeError: Transform function invalid for data types in Pandas version 0.13, you could try using this workaround:

result = list()
for key, grp in df.groupby(level=['STK_ID']):
    result.append(grp/grp.iloc[0])
result = pd.concat(result)
print(result)
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I use pandas 13.0, meet problem like: >>> first = df.groupby(level=['STK_ID']).transform('first') Traceback (most recent call last): File "", line 1, in File "C:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2319, in transform return self._transform_item_by_item(obj, fast_path) File "C:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2386, in _transform_item_by_item raise TypeError('Transform function invalid for data types') TypeError: Transform function invalid for data types – bigbug Nov 22 '16 at 02:02
  • What are the dtypes of the columns in `df`? Please post `df.info()`. – unutbu Nov 22 '16 at 02:05
  • >>> df.info() MultiIndex: 9 entries, (002138, 20140930) to (600660, 20160930) Data columns (total 2 columns): sales 9 non-null float64 net_pft 9 non-null float64 dtypes: float64(2)>>> – bigbug Nov 22 '16 at 02:08
  • >>> pd.__version__ '0.13.1' >>> – bigbug Nov 22 '16 at 02:14
  • Thank you. but not a feasible solution to upgrade pandas at current stage. I need to figure out another way. – bigbug Nov 22 '16 at 02:24