I want to assess the statistical difference of male and female users group by each of their total plays (see below example):
Example of female entries
female
users artist plays gender age
0 48591 sting 12763 f 25.0
1 48591 stars 8192 f 25.0
Sum plays per unique female user
female_user_plays = female.groupby('users').plays.sum()
female_user_plays
users
5 5479
6 3782
7 7521
11 7160
Example of male entries
female
users artist plays gender age
51 56496 iron maiden 456 m 28.0
52 56496 elle 407 m 28.0
Sum plays per unique male user
male_user_plays = male.groupby('users').plays.sum()
male_user_plays
users
0 3282
1 25329
2 51522
3 1590
Average plays per gender
Average Total Male Plays: 11880
Average Total Female Plays: 13104
Before trying the t test, I converted each Series into value lists:
female_plays_list = female_user_plays.values.tolist()
male_plays_list = male_user_plays.values.tolist()
And for the t test:
ttest_ind(female_plays_list, male_plays_list, equal_var=False)
The result is what's confused me since the outputs seem very off and I'm thinking it's not due to variance of the two sample sizes....
Ttest_indResult(statistic=-8.9617251652001002, pvalue=3.3195063228833119e-19)
Is there any reason outside of array length that could be causing this?