Using spark I aggregated data for each group (cohort) to only contain the mean, standard deviation, and variance.
Now in a second step using python I would like to test for normality (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.normaltest.html) and afterward for significance using either the t-test stats.ttest_ind
or stats.wilcoxon
rank test.
However, all these methods expect the data to be fed in as raw record-oriented values. How can I use them with the pre-aggregated data?