52

Possible Duplicate:
scale a series between two points in R

Does any know of an R function to perform range standardization on a vector? I'm looking to transform variables to a scale between 0 and 1, while retaining rank order and the relative size of separation between values.

Just to be clear, i'm not looking to standardize variables by mean centering and scaling by the SD, as is done in the function scale().

I tried the functions mmnorm() and rangenorm() in the package 'dprep', but these don't seem to do the job.

Community
  • 1
  • 1
Steve
  • 5,727
  • 10
  • 32
  • 30
  • 4
    Identical to http://stackoverflow.com/questions/5468280/scale-a-series-between-two-points-in-r/5468527#5468527 – Andrie Apr 14 '11 at 15:28
  • It is also very much identical to this question on stats.stackexchange: http://stats.stackexchange.com/q/1112/442 – Henrik Apr 14 '11 at 15:51
  • 6
    Don't delete it; just close it. It's linked to the other question, so it may be useful for people using the search functionality. – Joshua Ulrich Apr 14 '11 at 16:49
  • That's called **'scaling'**. Scaling can be by all sorts of denominators, not just the variable's SD; so not just the way the R builtin function 'scale()' does it. Admittedly it would be better if the builtin 'scale()' was parameterized to allow min-max scaling, or other possibilities. – smci Nov 20 '17 at 03:42

1 Answers1

98
s = sort(rexp(100))

range01 <- function(x){(x-min(x))/(max(x)-min(x))}

range01(s)

  [1] 0.000000000 0.003338782 0.007572326 0.012192201 0.016055006 0.017161145
  [7] 0.019949532 0.023839810 0.024421602 0.027197168 0.029889484 0.033039408
 [13] 0.033783376 0.038051265 0.045183382 0.049560233 0.056941611 0.057552543
 [19] 0.062674982 0.066001242 0.066420884 0.067689067 0.069247825 0.069432174
 [25] 0.070136067 0.076340460 0.078709590 0.080393512 0.085591881 0.087540132
 [31] 0.090517295 0.091026499 0.091251213 0.099218526 0.103236344 0.105724733
 [37] 0.107495340 0.113332392 0.116103438 0.124050331 0.125596034 0.126599323
 [43] 0.127154661 0.133392300 0.134258532 0.138253452 0.141933433 0.146748798
 [49] 0.147490227 0.149960293 0.153126478 0.154275371 0.167701855 0.170160948
 [55] 0.180313542 0.181834891 0.182554291 0.189188137 0.193807559 0.195903010
 [61] 0.208902645 0.211308713 0.232942314 0.236135220 0.251950116 0.260816843
 [67] 0.284090255 0.284150541 0.288498370 0.295515143 0.299408623 0.301264703
 [73] 0.306817872 0.307853369 0.324882091 0.353241217 0.366800517 0.389474449
 [79] 0.398838576 0.404266315 0.408936260 0.409198619 0.415165553 0.433960390
 [85] 0.440690262 0.458692639 0.464027428 0.474214070 0.517224262 0.538532221
 [91] 0.544911543 0.559945121 0.585390414 0.647030109 0.694095422 0.708385079
 [97] 0.736486707 0.787250428 0.870874773 1.000000000

Adding ... will allow you to pass through na.rm = T if you want to omit missing values from the calculation (they will still be present in the results):

range01 <- function(x, ...){(x - min(x, ...)) / (max(x, ...) - min(x, ...))}
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Andrie
  • 176,377
  • 47
  • 447
  • 496