This is two questions in one; if I should be splitting them, please let me know.
I have a spreadsheet of HR data and I'm going to be cutting it into various cross sections. Each row currently represents an employee, the year of that particular report (so for example over a three year period, an employee would appear three times and a column includes which year that row's referring to) and a series of other characteristics. Furthermore, I've added a field which shows how many FTEs that employee represents for that period which represents that employees exposure to risk.
What I'm trying to do, for the sake of marrying it up with other data, is create an n dimensional array where each point represents the total exposure to risk that matches the dimensions. In the example I'm using, the dimensions are Year, Company [there are a couple], Age Band, Gender, Division, Tenure band.
To do so, among other code, I've written the following:
FactorNames <- c("FY","HR Business", "Age Band", "Gender", "Classification Level 1", "Tenure Band")
FactorDim <- lapply(length,mapply(unique,HR[FactorNames]))
Names <- lapply(HR[FactorNames], function(x)sort(unique(x)))
Index <- 1
for (Ten in 1:FactorDim[6]){
for (Job in 1:FactorDim[5]) {
for (Sex in 1:FactorDim[4]) {
for (Age in 1:FactorDim[3]) {
for (Co in 1:FactorDim[2]) {
for (Year in 1:FactorDim[1]) {
ExpList[Index] = sum(subset(HR,
HR$FY == Names[1,Year],
HR$`HR Business` == Names[2, Co],
HR$`Age Band` == Names[3, Age],
HR$Gender == Names[4, Sex],
HR$`Classification Level 1` == Names[5,Job],
HR$`Tenure Band` == Names[6,Ten],
select=Exposure),
na.rm=TRUE)
Index <- Index + 1
}
}
}
}
}
}
There are two main issues.
Names <- lapply(HR[FactorNames], function(x)sort(unique(x)))
is incorrect aslapply(HR[FactorNames], function(x)sort(unique(x)))
returns the unique values as a single combined element rather than as a vector. This means that the contents for myfor
loops throw the errorError in Names[1, Year] : incorrect number of dimensions
.- There's no way that my concentric
for
loops are even close to being the optimal way to fill my array and I was wondering if anyone knew what was.
What would you recommend?