Standardize and rescale each column of every element in a list

Question

I have a list of 5 dataframes like so:

mydf <- data.frame(x=c(1:5), y=c(21:25),z=rnorm(1:5), p=rnorm(2:6), f=rnorm(3:7))
mylist <- rep(list(mydf),5)
names(mylist) <-c("2006-01-01","2006-01-02","2006-01-03","2006-01-04","2006-01-05")

I also have a 3 step formula and following bits of code i put together:

Step 1 - the code for this is as follows this needs to be calculated for every row of the same column. if x is an element of "z" , "f" or "p" then:

z = x - mean(column))/sd(column)

2 - rescale z scores from 0 using values from step 1

rz = abs(min(z)) + z

3 - Rescale RZ scores from step 2 such that they lie between 0 and 1

mrz = rz/max(rz)

I need to apply this formula to columns "z", "p", "f" only also objective_col <- colnames(mylist$'2006-01-05'[,3:5]) in every element of mylist using apply , sapply, lapply or other type of loop:

it will probably look something like:

lapply(mylist, FUN = function(x) .......)

Outputs should be in the same layout and format as mydf all stored in mylist2 <- list()

I will continue to update this as i make more progress. I'm still learning how to use loops and functions..Thanks to anyone that can provide some input.

How do you define **rows z, p, f**? All I see are columns. Do you have a different variable that defines these subsets or is this just a typo? — alexwhitworth, Aug 24 '15 at 23:42

score 2 · Accepted Answer · answered Aug 24 '15 at 23:48

2

out <- lapply(mylist, function(x) {
  x[, c("z", "p", "f")] <- apply(x[, c("z", "p", "f")], 2, function(y) {
    y2 <- scale(y)
    return((y2 + abs(min(y2))) / max(y2))    
  })
return(x)
})

answered Aug 24 '15 at 23:48

alexwhitworth

4,839
5
32
59

You should be able to use `lapply` instead of `apply(...,2,FUN)` – thelatemail Aug 24 '15 at 23:49
sure, either works ... `lapply(x[[c("z", "p", "f")]], function(y) ...` as above – alexwhitworth Aug 24 '15 at 23:50
i also looked at the 'scale' function earlier. does it represent exactly (x-μ)/σ ? formula was not shown in documentation so i figured i'll take the safer route,. Thank you for the help. looks great. – Alex Bădoi Aug 24 '15 at 23:53
1

well, technically, \mu and \sigma are unknown parameters. If you know their values (from the population), you can use them. `scale()` uses the MLE for \mu, \sigma (ie the sample mean and sample standard deviation. – alexwhitworth Aug 25 '15 at 00:02
1

@AlexBădoi - it gives the same results - test it: `all.equal(c(scale(1:10)),((1:10) - mean(1:10))/sd(1:10))` – thelatemail Aug 25 '15 at 00:11
@Alex - you seem to have included the last 2 steps into one line like so `return((y2 + abs(min(y2))) / max(y2))` . The last y2 -> `max(y2)` is it rescaled from 0 in your code? The last step of my formula uses rescaled Zvalues. would it be correct if we divide by `((max(y2)+min(y2))` ? – Alex Bădoi Aug 25 '15 at 10:15
actually `((max(y2)+min(y2))` would return just the `max` of `y2` which is the z score minus `min`. maybe divide by `max(y2 + abs(min(y2)))` – Alex Bădoi Aug 25 '15 at 10:23
@AlexBădoi I suggest you walk through the steps individually for a single column. It would be useful for you to develop the intuition to be able to understand what the code is doing without having to run it... that is how to learn – alexwhitworth Aug 25 '15 at 19:52

Standardize and rescale each column of every element in a list

1 Answers1