0

I am trying to do a batch normal distribution test.

My data looks like:

"Date","Department","Discipline","Employee ID","SumOfBillable Hrs"
"10/09/2012","D","B",50084.00,8.00
"10/09/2012","D","C",51870.00,10.00
"10/09/2012","D","E",50216.00,10.00
"10/09/2012","D","E",53422.00,9.00
"10/09/2012","D","E",53765.00,10.00
"14/01/2013","E","Y",53146.00,9.00
"14/01/2013","E","Y",53202.00,9.00
"14/01/2013","E","Y",54470.00,9.00
"14/01/2013","SITE","0",54525.00,9.00
"14/02/2013","D","C",51870.00,10.00
"14/02/2013","D","E",50029.00,8.50
"14/02/2013","D","E",50216.00,9.00
"14/02/2013","D","E",53422.00,4.00

I want to check the distributions of hours under each Employee_ID.

Is there a batch way to do this? I have over 80 IDs. So individually taking each ID and plotting / creating descriptive stats for it would be rather tedious.

Thanks

KillerSnail
  • 3,321
  • 11
  • 46
  • 64
  • 1
    Add a sample of your data to help us understand and answer your problem – Pop Feb 27 '13 at 08:24
  • 2
    You could easily split the "Hours" variable by your "Employee_ID" variable and calculate descriptive statistics and generate plots using `lapply` on the resulting list. Show some sample data, and you might get a more concrete answer. – A5C1D2H2I1M1N2O1R2T1 Feb 27 '13 at 08:30
  • These are relevant: http://stackoverflow.com/questions/7781798/seeing-if-data-is-normally-distributed-in-r, http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless – Ben Feb 27 '13 at 09:36

1 Answers1

1

You could start with something like this. If you wanted something different you would have to give more information about what you want to do with it specifically.

data <- read.table(header=T, sep=",", 
 text='"Date","Department","Discipline","Employee ID","SumOfBillable Hrs"
"10/09/2012","D","B",50084.00,8.00
"10/09/2012","D","C",51870.00,10.00
"10/09/2012","D","E",50216.00,10.00
"10/09/2012","D","E",53422.00,9.00
"10/09/2012","D","E",53765.00,10.00
"14/01/2013","E","Y",53146.00,9.00
"14/01/2013","E","Y",53202.00,9.00
"14/01/2013","E","Y",54470.00,9.00
"14/01/2013","SITE","0",54525.00,9.00
"14/02/2013","D","C",51870.00,10.00
"14/02/2013","D","E",50029.00,8.50
"14/02/2013","D","E",50216.00,9.00
"14/02/2013","D","E",53422.00,4.00')



# Means:
aggregate(SumOfBillable.Hrs ~ Employee.ID, data=data, FUN=mean)

# Standard Deviations:
aggregate(SumOfBillable.Hrs ~ Employee.ID, data=data, FUN=sd)

# Or a Shapiro normality test: (only works if you have more than 3 observations per Employee.ID
aggregate(SumOfBillable.Hrs ~ Employee.ID, data=data, FUN=shapiro.test)
N8TRO
  • 3,348
  • 3
  • 22
  • 40
  • The data is stored in a MS access DB which I have dumped out into a csv file. I want to get normal distribution plots, Std Dev, Mean and run a normality test for each unique employee ID. – KillerSnail Feb 27 '13 at 08:52