My task is to write a function, which aims to calculate logarithms of given variables (vars
) in a given data set (dset
) by levels of a declared variable (byvar
). If the minimum of a given variable for a given level of byvar
is greater than 0, a simple natural logarithm is calculated. Otherwise, new value of a given variable for a given segment is calculated as:
new.value = log(old.value + 1 + abs(min.value.of.given.var.for.given.level)
In order to achieve this, I wrote such a code (for a reproducible example):
set.seed(1234567)
data(iris)
iris$random <- rnorm(nrow(iris), 0, 1)
log.vars <- function(dset, vars, byvar, verbose = F){
# a loop by levels of "byvar"
for(i in 1:length(unique(dset[[byvar]]))){
if(verbose == T){
print(paste0("------ level=", unique(dset[[byvar]])[i], "----"))
}
# a loop by variables in "vars"
for(j in 1:length(vars)){
min.var <- min(dset[[vars[j]]][dset[[byvar]] == unique(dset[[byvar]])[i]])
# if minimum of a given variable for a given level is greater than 0 then
# calculate its logarithm;
# otherwise, add to its value 1 and the mode of its minimum and calculate
# its logarithm
dset[[paste0("ln_", vars[j])]][dset[[byvar]] == unique(dset[[byvar]])[i]] <-
if(min.var > 0){
log(dset[[vars[j]]][dset[[byvar]] == unique(dset[[byvar]])[i]])
} else{
log(dset[[vars[j]]][dset[[byvar]] == unique(dset[[byvar]])[i]] + 1 +
abs(min.var))
}
}
}
return(dset)
}
iris2 <- log.vars(dset = iris,
vars = c("Sepal.Length", "random", "Sepal.Width"),
byvar = "Species",
verbose = T)
head(iris2)
It works, however, there is a clear problem with its readability. Additionally, I wonder if its performance could be enhanced. Last but not least, the aim is to preserve the order of the observations in a data set. Any kind of help/suggestions would be appreciated