Test of Significance for dataset

Asked Oct 25 '19 at 17:58

Active Oct 25 '19 at 18:14

Viewed 100 times

I'm having trouble in the data analysis for this dataset.

So there are clients for pre-enrollment period (-5 to 0) and post-enrollment period (0-5) and I'd like to know the effect of cost between these periods. For example, does the cost increase or decrease per client and as a whole for dataset?

Now I can take the estimators (mean and stdev) for pre and post and compare them for significance, but how do I compare this across clients. I can't simply take the mean of cost pre- and post- across the dataset because each client has different range of cost; for instance, in the scenario below, client A has cost in 100s while client B has cost in 1000s.

How would you make the comparison across the clients once I find the significance in their pre- and post- cost for each client (let's say some p-value).

edited Oct 25 '19 at 18:14

asked Oct 25 '19 at 17:58

Charlywiggin

The standard approach for comparing groups is using chi-square or for tables larger than 2x2 fischer's exact test. With these you get a p-value telling you how likely it is that the difference is significant – CLpragmatics Oct 25 '19 at 18:22
@CLpragmatics Why would you specify Fisher's exact test for tables larger than 2x2? – Glen_b Oct 28 '19 at 03:24
whats the easiest to way to group the descriptive statistics (mean and stdev) for each client into pre(-5 to 0) and post( 1 -5) time periods for cost variable? – Charlywiggin Oct 28 '19 at 15:10
Answered my own question: `ranges = [-5,0,5]`
`df['timebins'] = pd.cut(df['Time'], ranges)`
and then
`df.groupby(['Client', 'timebins])['Cost'].describe()` ` – Charlywiggin Oct 28 '19 at 15:32

Test of Significance for dataset

0 Answers0