1

I have a dataset below:

  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9 

How do I normalize every column excluding the first to be normalized and have a set standard deviation from the mean of each column.

So for example below are the means for each column:

B = 4
C = 6.333
D = 20

I then want to normalize with the bounds to be no greater than 25% of the mean in either direction.

I think you can do it with rescale but I just don't know how to apply it to all columns:

library(scales)
rescale(x, to = c(mean - 0.25*mean, mean + 0.25*mean)

I know this is a way to do it but it doesn't take into account the bounds and the standard deviation set of 25%:

normalized <- function(x){
  return((x-min(x)) / (max(x)-min(x)))
}

normalized_dataset<-df %>% 
  mutate_at(vars(-one_of("A")), normalized)
nak5120
  • 4,089
  • 4
  • 35
  • 94

3 Answers3

1

I hope function rescale comes from package scales.

This is a typical example of the use of the *apply family of functions.
I will work on a copy of the data and rescale the copy, if you don't want to keep the original, it's a simple matter to modify the code below.

dat2 <- dat

dat2[-1] <- lapply(dat2[-1], function(x)
    scales::rescale(x, to = c(mean(x) - 0.25*mean(x), mean(x) + 0.25*mean(x))))

dat2
#    A B        C        D
#1 500 3 4.750000 15.00000
#2 501 5 7.916667 25.00000
#3 502 4 7.125000 15.76923

Data.

dat <- read.table(text = "
  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9 
", header = TRUE)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Maybe it's what OP wants but then your columns don't have the same mean anymore, and if the value of the mean is not important, why one would use it to set the range of the data ? – moodymudskipper May 24 '18 at 18:51
  • @Moody_Mudskipper I don't know, that's a good question for the OP. – Rui Barradas May 24 '18 at 18:53
1

If you already have code that does what you need but struggle to apply it to all columns except the first, try the simple base R approach.

Your function:

## your rescale function
fun1 <- function(x){
    return(  scales::rescale(x, to = c(mean(x) - 0.25*mean(x), mean(x) + 0.25*mean(x))))
}

Apply to all columns except the first:

dat[2:4] <- lapply(dat[2:4], fun1)
onlyphantom
  • 8,606
  • 4
  • 44
  • 58
1

Would this work ?

df <- read.table(text="
  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9",h=T)

df2 <- df
df2[-1] <- lapply(df[-1],function(x) mean(x) +(x-mean(x)) * 0.25*mean(x)/max(abs(x-mean(x))))

#     A B        C    D
# 1 500 3 4.750000 17.2
# 2 501 5 7.464286 25.0
# 3 502 4 6.785714 17.8

The mean stays the same for each relevant column, but values are rescaled so that the furthest value from the mean is at a mean*25% distance from it.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167