3

I've a Dataframe in this format:

| Department | Person | Power  | ... |
|------------|--------|--------|-----|
| ABC        | 1234   |  75    | ... |
| ABC        | 1235   |  25    | ... |
| DEF        | 1236   |  50    | ... |
| DEF        | 1237   | 100    | ... |
| DEF        | 1238   |  25    | ... |
| DEF        | 1239   |  50    | ... |

What I now want to get is the sum of occurrences for each value in the power column. How can I get this from my DataFrame?

| Department | 100 |  75 |  50 |  25 |
|------------|-----|-----|-----|-----|
| ABC        |   0 |   1 |   0 |   1 |
| DEF        |   1 |   0 |   2 |   1 |
Cœur
  • 37,241
  • 25
  • 195
  • 267
Gerrit
  • 2,515
  • 5
  • 38
  • 63

2 Answers2

4

You can use value_counts with sort_index, then generate DataFrame by to_frame and last transpose by T:

print (df.Power.value_counts().sort_index(ascending=False).to_frame().T)
       100  75   50   25 
Power    1    1    2    2

EDIT by comment:

You need crosstab:

print (pd.crosstab(df.Department, df.Power).sort_index(axis=1, ascending=False))
Power       100  75   50   25 
Department                    
ABC           0    1    0    1
DEF           1    0    2    1

Faster another solution with groupby and unstack:

print (df.groupby(['Department','Power'])
         .size()
         .unstack(fill_value=0)
         .sort_index(axis=1, ascending=False))

Power       100  75   50   25 
Department                    
ABC           0    1    0    1
DEF           1    0    2    1

If need groupby by columns Department and Person, add column Person to groupby to second position (thank you piRSquared):

print (df.groupby(['Department','Person', 'Power'])
         .size()
         .unstack(fill_value=0)
         .sort_index(axis=1, ascending=False))

Power              100  75   50   25 
Department Person                    
ABC        1234      0    1    0    0
           1235      0    0    0    1
DEF        1236      0    0    1    0
           1237      1    0    0    0
           1238      0    0    0    1
           1239      0    0    1    0

EDIT1 by comment:

If need add another missing values, use reindex:

print (df.groupby(['Department','Power'])
         .size()
         .unstack(fill_value=0)
         .reindex(columns=[100,75,50,25,0], fill_value=0))

Power       100  75   50   25   0  
Department                         
ABC           0    1    0    1    0
DEF           1    0    2    1    0
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, and how can I do it, if I have a second column on which I want to group the results? – Gerrit Aug 31 '16 at 13:13
  • Can you add desired output, sorry, I dont understand what means `group the results`. – jezrael Aug 31 '16 at 13:14
  • Thanks, I add 2 solutions. please check it. – jezrael Aug 31 '16 at 13:37
  • 1
    @Gerrit for jezrael's 2nd solution with `groupby`, notice how `Power` was the second column in the `groupby` and when `unstack`ed it ended up in the columns. If you want to have a second column on which to `groupby`, add it to the `groupby` like this: `df.groupby(['Department','Person','Power'])`. The rest of the answer follows as above. This is your best answer. – piRSquared Aug 31 '16 at 13:46
  • Thank you @piRSquared. I add it to answer. – jezrael Aug 31 '16 at 13:48
  • Is it possible to get `100`, `75`, `50`, `25`, `0` in the columns independently if there are all these values in my data? – Gerrit Aug 31 '16 at 14:04
  • There is only `5` values or more? – jezrael Aug 31 '16 at 14:06
  • I add solution, please check it. – jezrael Aug 31 '16 at 14:10
1

or it can be done this way:

>>> df.groupby(['Department','Power']).count().unstack().fillna(0)

           Person               
Power         25   50   75   100
Department                      
ABC           1.0  0.0  1.0  0.0
DEF           1.0  2.0  0.0  1.0
kekert
  • 968
  • 9
  • 19