1
a = pd.DataFrame(df.groupby('actor_1_name')['gross'].sum())
b = pd.DataFrame(df.groupby('actor_2_name')['gross'].sum())
c = pd.DataFrame(df.groupby('actor_3_name')['gross'].sum())

x = [a,b,c]
y = pd.concat(x)

p =['actor_1_name','actor_2_name','actor_3_name','gross']
df.loc[y.nlargest(3).index,p]

I want to find the sum of each column then combine them together to find the top 3 highest values, but I'm getting an error and not sure what to do to fix it. I need some assistance. Error Message

enter image description here

1 Answers1

0

I believe you need:

df = pd.DataFrame({'actor_1_name':['a','a','a','b','b','c','c','d','d','e'],
                   'actor_2_name':['d','d','a','c','b','c','c','d','e','e'],
                   'actor_3_name':['c','c','a','b','b','b','c','e','e','e'],
                   'gross':[1,2,3,4,5,6,7,8,9,10]})
print (df)
  actor_1_name actor_2_name actor_3_name  gross
0            a            d            c      1
1            a            d            c      2
2            a            a            a      3
3            b            c            b      4
4            b            b            b      5
5            c            c            b      6
6            c            c            c      7
7            d            d            e      8
8            d            e            e      9
9            e            e            e     10

a = df.groupby('actor_1_name')['gross'].sum().nlargest(3)
b = df.groupby('actor_2_name')['gross'].sum().nlargest(3)
c = df.groupby('actor_3_name')['gross'].sum().nlargest(3)

x = [a,b,c]
print (x)
[actor_1_name
d    17
c    13
e    10
Name: gross, dtype: int64, actor_2_name
e    19
c    17
d    11
Name: gross, dtype: int64, actor_3_name
e    27
b    15
c    10
Name: gross, dtype: int64]

df1 = pd.concat(x, axis=1, keys=['actor_1_name','actor_2_name','actor_3_name'])
print (df1)
   actor_1_name  actor_2_name  actor_3_name
b           NaN           NaN          15.0
c          13.0          17.0          10.0
d          17.0          11.0           NaN
e          10.0          19.0          27.0

EDIT1:

a = df.groupby('actor_1_name')['gross'].sum().nlargest(3).reset_index()
b = df.groupby('actor_2_name')['gross'].sum().nlargest(3).reset_index()
c = df.groupby('actor_3_name')['gross'].sum().nlargest(3).reset_index()

x = [a,b,c]
print (x)
[  actor_1_name  gross
0            d     17
1            c     13
2            e     10,   actor_2_name  gross
0            e     19
1            c     17
2            d     11,   actor_3_name  gross
0            e     27
1            b     15
2            c     10]

df1 = pd.concat(x, axis=1, keys=['a','b','c'])
df1.columns = df1.columns.map('-'.join)
print (df1)
  a-actor_1_name  a-gross b-actor_2_name  b-gross c-actor_3_name  c-gross
0              d       17              e       19              e       27
1              c       13              c       17              b       15
2              e       10              d       11              c       10

EDIT2:

a = df.groupby('actor_1_name')['gross'].sum().nlargest(3).reset_index(drop=True)
b = df.groupby('actor_2_name')['gross'].sum().nlargest(3).reset_index(drop=True)
c = df.groupby('actor_3_name')['gross'].sum().nlargest(3).reset_index(drop=True)

x = [a,b,c]
print (x)
[0    17
1    13
2    10
Name: gross, dtype: int64, 0    19
1    17
2    11
Name: gross, dtype: int64, 0    27
1    15
2    10
Name: gross, dtype: int64]

df1 = pd.concat(x, axis=1, keys=['actor_1_name','actor_2_name','actor_3_name'])
print (df1)
   actor_1_name  actor_2_name  actor_3_name
0            17            19            27
1            13            17            15
2            10            11            10
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This shows 27 ;o just want them all combined to make 1 column and output the top 3 highest sum(gross) –  Jun 25 '17 at 06:05
  • See edited answer. you get nans, because `x` after concat are aligned by `acor names`, so values are reordered and if miss some category in some column `NaN` is added. – jezrael Jun 25 '17 at 06:16
  • how can I drop remove the NaN? –  Jun 25 '17 at 06:18
  • There are 2 possible solutions - remove acor names and use only numbers or create `6` columns. Give me a sec. – jezrael Jun 25 '17 at 06:21
  • I need the actor's name to be included ;o –  Jun 25 '17 at 06:27
  • So need edit1. becuse for concat need same indexes and in original solution all a,b,c outputs (Series) has different indexes. – jezrael Jun 25 '17 at 06:30
  • Oh didnt even see that xD. Thanks! –  Jun 25 '17 at 06:34