t-test for two clusters Python

Question

I am doing kmeans clustering, and want to test the resultant clusters are statistically different. In 3 level clustering, I test cluster 0 with cluster 1 and then with cluster 2. Then I test cluster 2 with cluster 3. I tried to apply t-test clustering as shown in the following code. The clusters have different lengths as you know. I am confused about the logic? Should I use p>0.05 or p<0.05. Then where to put True and False?

  def compare_2_groups(ar1,ar2):
    s,p=ttest_ind(ar1,ar2)
    #if p>0.05:
    if p<0.05:
        return False
    else:
        return True

score 1 · Accepted Answer · answered Dec 08 '21 at 18:11

This procedure should work, even if ar1 and ar2 have different lengths. The p value result indicates the strength of evidence AGAINST the null hypothesis that the two clusters have the same center, where smaller p indicates stronger evidence. Two suggestions:

rename the function to reflect the nature of the test, like "are_group_centers_equal"
if using this name return False if p < (your threshold), True otherwise

If you choose a name with the opposite meaning "are_group_centers_different", reverse the logic of the threshold test, returning True if p < (threshold).

Dear AbbeGijly, If I have clusters that have numeric and categorical values, can I still use the t-test? and is there Python t-test code for mixed values? — Saif, Dec 28 '21 at 16:01

t-test for two clusters Python

1 Answers1