Reading other questions (1) I am under the impression that pandas does boxplots or doing statistical analysis best when the data is in the following format:
stimulus vote
0 1 0
1 1 1
2 1 1
3 1 1
4 1 2
5 1 2
6 1 2
7 1 2
8 1 2
9 1 2
10 1 3
11 1 3
12 1 3
13 1 3
where stimulus
is my independent variable and vote
is each score given to it.
However my data already comes grouped by rating
, with another column votes
showing the .count()
of each vote.
stimulus rating votes
0 1 0 1
1 1 1 3
2 1 2 6
3 1 3 4
again, stimulus
is my IV, rating
is the score scale and votes
is the number of votes given to each score.
I am now having trouble working with that format and I can't even find out how I can transform this data back into the "stacked" or "record" format.
In the end I want to
- plot the data as a boxplot
- execute a Kruskal-Wallis H-test