batch normal distribution test

Question

I am trying to do a batch normal distribution test.

My data looks like:

"Date","Department","Discipline","Employee ID","SumOfBillable Hrs"
"10/09/2012","D","B",50084.00,8.00
"10/09/2012","D","C",51870.00,10.00
"10/09/2012","D","E",50216.00,10.00
"10/09/2012","D","E",53422.00,9.00
"10/09/2012","D","E",53765.00,10.00
"14/01/2013","E","Y",53146.00,9.00
"14/01/2013","E","Y",53202.00,9.00
"14/01/2013","E","Y",54470.00,9.00
"14/01/2013","SITE","0",54525.00,9.00
"14/02/2013","D","C",51870.00,10.00
"14/02/2013","D","E",50029.00,8.50
"14/02/2013","D","E",50216.00,9.00
"14/02/2013","D","E",53422.00,4.00

I want to check the distributions of hours under each Employee_ID.

Is there a batch way to do this? I have over 80 IDs. So individually taking each ID and plotting / creating descriptive stats for it would be rather tedious.

Thanks

Add a sample of your data to help us understand and answer your problem — Pop, Feb 27 '13 at 08:24
You could easily split the "Hours" variable by your "Employee_ID" variable and calculate descriptive statistics and generate plots using `lapply` on the resulting list. Show some sample data, and you might get a more concrete answer. — A5C1D2H2I1M1N2O1R2T1, Feb 27 '13 at 08:30
These are relevant: http://stackoverflow.com/questions/7781798/seeing-if-data-is-normally-distributed-in-r, http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless — Ben, Feb 27 '13 at 09:36

N8TRO · Answer 1 · 2013-02-27T09:32:04.507

You could start with something like this. If you wanted something different you would have to give more information about what you want to do with it specifically.

data <- read.table(header=T, sep=",", 
 text='"Date","Department","Discipline","Employee ID","SumOfBillable Hrs"
"10/09/2012","D","B",50084.00,8.00
"10/09/2012","D","C",51870.00,10.00
"10/09/2012","D","E",50216.00,10.00
"10/09/2012","D","E",53422.00,9.00
"10/09/2012","D","E",53765.00,10.00
"14/01/2013","E","Y",53146.00,9.00
"14/01/2013","E","Y",53202.00,9.00
"14/01/2013","E","Y",54470.00,9.00
"14/01/2013","SITE","0",54525.00,9.00
"14/02/2013","D","C",51870.00,10.00
"14/02/2013","D","E",50029.00,8.50
"14/02/2013","D","E",50216.00,9.00
"14/02/2013","D","E",53422.00,4.00')



# Means:
aggregate(SumOfBillable.Hrs ~ Employee.ID, data=data, FUN=mean)

# Standard Deviations:
aggregate(SumOfBillable.Hrs ~ Employee.ID, data=data, FUN=sd)

# Or a Shapiro normality test: (only works if you have more than 3 observations per Employee.ID
aggregate(SumOfBillable.Hrs ~ Employee.ID, data=data, FUN=shapiro.test)

The data is stored in a MS access DB which I have dumped out into a csv file. I want to get normal distribution plots, Std Dev, Mean and run a normality test for each unique employee ID. — KillerSnail, Feb 27 '13 at 08:52

batch normal distribution test

1 Answers1