0

I'm having trouble in the data analysis for this dataset.

So there are clients for pre-enrollment period (-5 to 0) and post-enrollment period (0-5) and I'd like to know the effect of cost between these periods. For example, does the cost increase or decrease per client and as a whole for dataset?

Now I can take the estimators (mean and stdev) for pre and post and compare them for significance, but how do I compare this across clients. I can't simply take the mean of cost pre- and post- across the dataset because each client has different range of cost; for instance, in the scenario below, client A has cost in 100s while client B has cost in 1000s.

enter image description here

How would you make the comparison across the clients once I find the significance in their pre- and post- cost for each client (let's say some p-value).

  • The standard approach for comparing groups is using chi-square or for tables larger than 2x2 fischer's exact test. With these you get a p-value telling you how likely it is that the difference is significant – CLpragmatics Oct 25 '19 at 18:22
  • @CLpragmatics Why would you specify Fisher's exact test for tables larger than 2x2? – Glen_b Oct 28 '19 at 03:24
  • whats the easiest to way to group the descriptive statistics (mean and stdev) for each client into pre(-5 to 0) and post( 1 -5) time periods for cost variable? – Charlywiggin Oct 28 '19 at 15:10
  • Answered my own question: `ranges = [-5,0,5]`
    `df['timebins'] = pd.cut(df['Time'], ranges)`
    and then
    `df.groupby(['Client', 'timebins])['Cost'].describe()` `
    – Charlywiggin Oct 28 '19 at 15:32

0 Answers0