i've tried every combination of pandas.melt and .stack and .pivot but have not made any progress.
i have an excel table in the following general format:
1997 1998 1999 2000
Total, N (%) 3350 (34) 3387 (33) 4778 (33) 3588 (33)
Age category N, (%)
A 231 (24) 227 (24) 222 (23) 211 (22)
B 492 (24) 481 (24) 487 (24) 405 (24)
C 759 (28) 759 (27) 746 (26) 746 (26)
D 1901 (45) 1873 (44) 1233 (44) 1903 (44)
Sex, N (%)
F 1650 (33) 1493 (33) 1673 (33) 1628 (32)
M 1734 (35) 1794 (34) 1705 (34) 1760 (34)
Diet
Vegan 1553 (32) 1442 (31) 1453 (31) 1422 (31)
Carnivore 1857 (36) 1063 (36) 1225 (35) 1926 (34)
Favorite movie
horror 1036 (24) 1033 (24) 1458 (24) 1742 (24)
romance 732 (41) 743 (40) 735 (40) 799 (38)
comedy 514 (34) 498 (32) 518 (32) 496 (32)
silent 1110 (47) 1933 (47) 1967 (46) 1751 (46)
* Percents are in relation to 100% of children who filled out survey
i'm trying to manipulation this data so i can create a bar graph of:
- x axis for years 1997 - 2000
- LEFT y axis as count N
- RIGHT y axis as %
code from the excel sheet i'm using to build df:
import pandas as pd
categories = ['Age category N, (%)', 'Sex, N (%)', 'Diet', 'Favorite movie']
subcategories = ['A','B','C','D','F','M',"Vegan","Carnivore",'horror','romance','comedy','silent']
df = pd.DataFrame(
{'1997': [33850 (34), NaN ,231 (24), 492 (24), 759 (28), 1901 (45), NaN , 1650 (33), 1734 (35),NaN ,
1553 (32), 1857 (36),NaN , 1036 (24), 732 (41), 514 (34), 1110 (47)],
'1998': [33687 (33),NaN ,227 (24), 481 (24), 759 (27), 1873 (44),NaN ,1493 (33), 1794 (34), NaN ,1442 (31), 1063 (36),NaN , 1033 (24), 743 (40), 498 (32), 1933 (47)],
'1999': [3778 (33), NaN ,222 (23), 487 (24), 746 (26), 1233 (44),NaN , 1673 (33), 1705 (34),NaN , 1453 (31), 1225 (35),NaN , 1458 (24), 735 (40), 518 (32), 1967 (46)],
'2000' : [3588 (33),NaN ,211 (22), 405 (24), 746 (26), 1903 (44), NaN , 1628 (32), 1760 (34),NaN , 1422 (31), 1926 (34),NaN , 1742 (24), 799 (38), 496 (32), 1751 (46)]},
index = pd.MultiIndex.from_tuples(
[('Age category N, (%)','A'),('Age category N, (%)','B'),('Age category N, (%)','C'),
('Age category N, (%)', 'D'), ('Sex, N (%)', 'F'), ('Sex, N (%)', 'M'),
('Diet', 'Vegan'), ('Diet', 'Carnivore'),
('Favorite Movie','horror'),('Favorite Movie','romance'),('Favorite Movie','comedy'),('Favorite Movie','silent')],
names = [categories, subcategories]))
i have 2 problems i am looking to pivot table so that rows 'age', 'sex', 'diet' and 'favorite movie' are multi-index columns, with categories beneath each, and with years as rows (observations) so final product would look like:
'age' 'sex' 'diet' 'favorite movie'
A B C D F M Vegan Carnivore horror romance comedy silent
1997
1998
1999
2000
the sticking point is that i have to separate the '(%)' from the count, while keeping it 'associated' with its number (to inform right y axis)
any guidance truly appreciated !