1

I have some data of use of phone devices by gender which I'd like to in pandas.

I need to count the values and create a new column with those counts.

This is what the data df look like:

Sex     Apple   Samsung Huawei  Tecno
Male    Yes     Yes     No      No
Female  Yes     Yes     No      No
Female  Yes     Yes     No      No
Male    Yes     Yes     No      No
Male    No      Yes     No      No
Female  No      No      No      No
Female  Yes     Yes     No      No
Male    Yes     Yes     No      No
Male    Yes     Yes     No      No
Male    Yes     Yes     No      No
Female  Yes     Yes     No      No
Female  Yes     Yes     No      No
Female  Yes     Yes     No      No
Female  Yes     Yes     No      No
Female  No      Yes     No      No
Female  Yes     Yes     No      Yes
Male    Yes     Yes     No      No

This is what I want:

Sex     Response Apple  Samsung Huawei  Tecno
Male    Yes      6      7       0       0
        No       1      0       7       7
Female  Yes      8      9       0       1
        No       2      1       10      9

I'm spinning around in circles trying to get this to work and my code is so messy I'm a little ashamed to post it. I've at least started with this:

for name, group in df.groupby('Sex'):
    print(name)
    print(group)

I believe I can achieve this with some combination of groupby & unstack. Also if someone could find a worthwhile tutorial on groupby of multilevel data, I would appreciate it.

JAG2024
  • 3,987
  • 7
  • 29
  • 58

1 Answers1

1

Use:

df = (df.melt('Sex', value_name='Response')
       .groupby(['Sex', 'Response', 'variable'])
       .size()
       .unstack(fill_value=0)
       .rename_axis(None, axis=1))
print (df)
                 Apple  Huawei  Samsung  Tecno
Sex    Response                               
Female No            2      10        1      9
       Yes           8       0        9      1
Male   No            1       7        0      7
       Yes           6       0        7      0

Another similar solution:

df = (df.melt('Sex', value_name='Response')
       .groupby(['Sex', 'Response'])['variable']
       .value_counts()
       .unstack(fill_value=0)
       .rename_axis(None, axis=1))

Or:

df1 = df.melt('Sex', value_name='Response')
df = pd.crosstab([df1['Sex'], df1['Response']], df1['variable']).rename_axis(None, axis=1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thanks @jezrael, check out this other question if you have time: https://stackoverflow.com/questions/51532581/create-barplot-from-string-data-using-groupby-and-multiple-columns-in-pandas-dat. No bites yet – JAG2024 Jul 27 '18 at 16:24
  • or this one: https://stackoverflow.com/questions/51546293/seaborn-and-pandas-make-multiple-x-category-bar-plot-using-multi-index-data-in +jezrael – JAG2024 Jul 27 '18 at 16:24
  • @JAG2024 - Do you need something like [this](https://stackoverflow.com/a/43999896/2901002) ? – jezrael Jul 28 '18 at 05:16
  • yes, exactly. I was hoping to do it with seaborn but based on my research it looks like that's not possible – JAG2024 Jul 28 '18 at 05:41