t testing total plays of users by gender - python

Question

I want to assess the statistical difference of male and female users group by each of their total plays (see below example):

Example of female entries

female

    users   artist  plays   gender  age
0   48591   sting   12763   f       25.0
1   48591   stars   8192    f       25.0

Sum plays per unique female user

female_user_plays = female.groupby('users').plays.sum()

female_user_plays

users
5         5479
6         3782
7         7521
11        7160

Example of male entries

female
    users   artist         plays    gender  age
51  56496   iron maiden    456      m       28.0
52  56496   elle           407      m       28.0

Sum plays per unique male user

male_user_plays = male.groupby('users').plays.sum()
male_user_plays

users
0         3282
1        25329
2        51522
3         1590

Average plays per gender

Average Total Male Plays: 11880
Average Total Female Plays: 13104

Before trying the t test, I converted each Series into value lists:

female_plays_list = female_user_plays.values.tolist()
male_plays_list = male_user_plays.values.tolist()

And for the t test:

ttest_ind(female_plays_list, male_plays_list, equal_var=False)

The result is what's confused me since the outputs seem very off and I'm thinking it's not due to variance of the two sample sizes....

Ttest_indResult(statistic=-8.9617251652001002, pvalue=3.3195063228833119e-19)

Is there any reason outside of array length that could be causing this?

what does the users value stand for? are these the complete figures or just a sample? — manandearth, Mar 03 '18 at 19:52
@manandearth I've revised my code for clarity. The values stand for total plays per unique user, i.e. all plays across artists listened to by one user, those plays were summed and made the value for that user. i want to compare total plays oper user between males and females — Mr. Jibz, Mar 04 '18 at 14:32

score 1 · Accepted Answer · answered Mar 05 '18 at 16:35

A test of two arrays of 100000000 values of random integers from 0-10000 gives the following result:

In []: try1 = np.random.randint(1, 10000, 100000000)

In []: try2 = np.random.randint(1, 10000, 100000000)

In []: ttest_ind(try1, try2, equal_var=False)
Out[]: Ttest_indResult(statistic=-0.67549204672468233, pvalue=0.49936320345035146)

and of unequal lengths gives the following:

In []: try1 = np.random.randint(1, 10000, 1000000)

In []: ttest_ind(try1, try2, equal_var=False)
Out[]: Ttest_indResult(statistic=-0.39754328321364363, pvalue=0.6909669583715552)

so unless there's an insight I overlooked in my test or your arrays are of greater length it must be something in specific values of the arrays.