I'm having trouble in the data analysis for this dataset.
So there are clients for pre-enrollment period (-5 to 0) and post-enrollment period (0-5) and I'd like to know the effect of cost between these periods. For example, does the cost increase or decrease per client and as a whole for dataset?
Now I can take the estimators (mean and stdev) for pre and post and compare them for significance, but how do I compare this across clients. I can't simply take the mean of cost pre- and post- across the dataset because each client has different range of cost; for instance, in the scenario below, client A has cost in 100s while client B has cost in 1000s.
How would you make the comparison across the clients once I find the significance in their pre- and post- cost for each client (let's say some p-value).
`df['timebins'] = pd.cut(df['Time'], ranges)`
and then
`df.groupby(['Client', 'timebins])['Cost'].describe()` ` – Charlywiggin Oct 28 '19 at 15:32