1

i've tried every combination of pandas.melt and .stack and .pivot but have not made any progress.

i have an excel table in the following general format:

                1997           1998     1999        2000
Total, N (%)    3350 (34)   3387 (33)   4778 (33)   3588 (33)
Age category N, (%)             
  A             231 (24)    227 (24)    222 (23)    211 (22)
  B             492 (24)    481 (24)    487 (24)    405 (24)
  C             759 (28)    759 (27)    746 (26)    746 (26)
  D            1901 (45)    1873 (44)   1233 (44)   1903 (44)
Sex, N (%)              
  F            1650 (33)    1493 (33)   1673 (33)   1628 (32)
  M            1734 (35)    1794 (34)   1705 (34)   1760 (34)
Diet                
  Vegan        1553 (32)    1442 (31)   1453 (31)   1422 (31)
  Carnivore    1857 (36)    1063 (36)   1225 (35)   1926 (34)
Favorite movie              
  horror       1036 (24)    1033 (24)   1458 (24)   1742 (24)
  romance       732 (41)    743 (40)    735 (40)    799 (38)
  comedy        514 (34)    498 (32)    518 (32)    496 (32)
  silent        1110 (47)   1933 (47)   1967 (46)   1751 (46)
* Percents are in relation to 100% of children who filled out survey                

i'm trying to manipulation this data so i can create a bar graph of:

  • x axis for years 1997 - 2000
  • LEFT y axis as count N
  • RIGHT y axis as %

code from the excel sheet i'm using to build df:

import pandas as pd

categories = ['Age category N, (%)', 'Sex, N (%)', 'Diet', 'Favorite movie']
subcategories = ['A','B','C','D','F','M',"Vegan","Carnivore",'horror','romance','comedy','silent']


df = pd.DataFrame(
    {'1997':    [33850 (34), NaN ,231 (24),  492 (24), 759 (28), 1901 (45), NaN , 1650 (33),    1734 (35),NaN ,
             1553 (32), 1857 (36),NaN , 1036 (24),  732 (41),   514 (34),   1110 (47)],
    '1998': [33687 (33),NaN ,227 (24),  481 (24),   759 (27),   1873 (44),NaN ,1493 (33),   1794 (34),  NaN ,1442 (31), 1063 (36),NaN , 1033 (24),  743 (40),   498 (32),   1933 (47)],
    '1999': [3778 (33), NaN ,222 (23),  487 (24),   746 (26),   1233 (44),NaN   ,   1673 (33),  1705 (34),NaN , 1453 (31),  1225 (35),NaN   ,   1458 (24),  735 (40),   518 (32),   1967 (46)],
    '2000' : [3588 (33),NaN ,211 (22),  405 (24),   746 (26),   1903 (44),  NaN ,   1628 (32),  1760 (34),NaN , 1422 (31),  1926 (34),NaN , 1742 (24),  799 (38),   496 (32),   1751 (46)]},
        index  = pd.MultiIndex.from_tuples(
            [('Age category N, (%)','A'),('Age category N, (%)','B'),('Age category N, (%)','C'),
            ('Age category N, (%)', 'D'), ('Sex, N (%)', 'F'), ('Sex, N (%)', 'M'),
            ('Diet', 'Vegan'), ('Diet', 'Carnivore'),
            ('Favorite Movie','horror'),('Favorite Movie','romance'),('Favorite Movie','comedy'),('Favorite Movie','silent')],
            names = [categories, subcategories]))

i have 2 problems i am looking to pivot table so that rows 'age', 'sex', 'diet' and 'favorite movie' are multi-index columns, with categories beneath each, and with years as rows (observations) so final product would look like:

        'age'       'sex'       'diet'                  'favorite movie'
     A  B   C D     F   M      Vegan  Carnivore     horror  romance comedy  silent 
1997
1998
1999
2000

the sticking point is that i have to separate the '(%)' from the count, while keeping it 'associated' with its number (to inform right y axis)

any guidance truly appreciated !

user426
  • 31
  • 4

0 Answers0