Normalize by set standard deviation from mean of every column (excluding first)

Question

I have a dataset below:

  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9

How do I normalize every column excluding the first to be normalized and have a set standard deviation from the mean of each column.

So for example below are the means for each column:

B = 4
C = 6.333
D = 20

I then want to normalize with the bounds to be no greater than 25% of the mean in either direction.

I think you can do it with rescale but I just don't know how to apply it to all columns:

library(scales)
rescale(x, to = c(mean - 0.25*mean, mean + 0.25*mean)

I know this is a way to do it but it doesn't take into account the bounds and the standard deviation set of 25%:

normalized <- function(x){
  return((x-min(x)) / (max(x)-min(x)))
}

normalized_dataset<-df %>% 
  mutate_at(vars(-one_of("A")), normalized)

I don't know how to calculate it but I want to replace the current columns except for the first one with the new normalized values. — nak5120, May 24 '18 at 18:36
can you confirm that you're talking about `scales::rescale`, and add the library call to your question ? — moodymudskipper, May 24 '18 at 18:52

score 1 · Accepted Answer · answered May 24 '18 at 18:43

I hope function rescale comes from package scales.

This is a typical example of the use of the *apply family of functions.
I will work on a copy of the data and rescale the copy, if you don't want to keep the original, it's a simple matter to modify the code below.

dat2 <- dat

dat2[-1] <- lapply(dat2[-1], function(x)
    scales::rescale(x, to = c(mean(x) - 0.25*mean(x), mean(x) + 0.25*mean(x))))

dat2
#    A B        C        D
#1 500 3 4.750000 15.00000
#2 501 5 7.916667 25.00000
#3 502 4 7.125000 15.76923

Data.

dat <- read.table(text = "
  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9 
", header = TRUE)

Maybe it's what OP wants but then your columns don't have the same mean anymore, and if the value of the mean is not important, why one would use it to set the range of the data ? — moodymudskipper, May 24 '18 at 18:51
@Moody_Mudskipper I don't know, that's a good question for the OP. — Rui Barradas, May 24 '18 at 18:53

score 1 · Answer 2 · answered May 24 '18 at 18:43

If you already have code that does what you need but struggle to apply it to all columns except the first, try the simple base R approach.

Your function:

## your rescale function
fun1 <- function(x){
    return(  scales::rescale(x, to = c(mean(x) - 0.25*mean(x), mean(x) + 0.25*mean(x))))
}

Apply to all columns except the first:

dat[2:4] <- lapply(dat[2:4], fun1)

moodymudskipper · Answer 3 · 2018-05-24T19:02:37.913

1

Would this work ?

df <- read.table(text="
  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9",h=T)

df2 <- df
df2[-1] <- lapply(df[-1],function(x) mean(x) +(x-mean(x)) * 0.25*mean(x)/max(abs(x-mean(x))))

#     A B        C    D
# 1 500 3 4.750000 17.2
# 2 501 5 7.464286 25.0
# 3 502 4 6.785714 17.8

The mean stays the same for each relevant column, but values are rescaled so that the furthest value from the mean is at a mean*25% distance from it.

edited May 24 '18 at 19:02

answered May 24 '18 at 18:43

moodymudskipper

46,417
11
121
167

sorry can this actually be applied to the first two columns? – nak5120 May 24 '18 at 18:58
this looks pretty good though, just need to skip the first two rather than just the first – nak5120 May 24 '18 at 18:58
2

you'd have to type [-(1:2)] instead of [-1] : `df2[-(1:2)] <- lapply(df[-(1:2)] ...` – moodymudskipper May 24 '18 at 19:00

Normalize by set standard deviation from mean of every column (excluding first)

3 Answers3