2

I have a dataset that I need to transfer into normal distribution.

First, Generate a reproducible dataset.

df <- runif(500, 0, 100)

Second, define a function. This function will continue transforming d.f. until P > 0.05. The transformed d.f. will be generated and named as y.

BoxCoxTrans <- function(y)    
{
    lambda <- 1
    constant <- 0
    while(shapiro.test(y)$p.value < 0.10) 
    {
        constant <- abs(min(y, na.rm = TRUE)) + 0.001
        y <- y + constant
        lambda <- powerTransform(y)$lambda
        y <- y ^ lambda
    }
    assign("y", y, envir = .GlobalEnv) 
}

Third, test df

shapiro.test(df)

Shapiro-Wilk normality test

data:  df
W = 0.95997, p-value = 2.05e-10

Because P < 0.05, transform df

BoxCoxTrans(df)

Then it gives me the following error messages,

Error in qr.resid(xqr, w * fam(Y, lambda, j = TRUE)) : 
NA/NaN/Inf in foreign function call (arg 5)

What did I do wrong?

HQ L
  • 103
  • 1
  • 11

2 Answers2

3

You could use a Box-Muller Transformation to generate an approximately normal distribution from a random uniform distribution. This might be more appropriate than a Box-Cox Transformation, which AFAIK is typically applied to convert a skewed distribution into one that is almost normal.

Here's an example of a Box-Muller Transformation applied to a set of uniformly distributed numbers:

set.seed(1234)
size <- 5000
a <- runif(size)
b <- runif(size)
y <- sqrt(-2 * log(a)) * cos(2 * pi * b)
plot(density(y), main = "Example of Box-Muller Transformation", xlab="x", ylab="f(x)")
library(nortest)
#> lillie.test(y)
#
#   Lilliefors (Kolmogorov-Smirnov) normality test
#
#data:  y
#D = 0.009062, p-value = 0.4099
#
#> shapiro.test(y)
#
#   Shapiro-Wilk normality test
#
#data:  y
#W = 0.99943, p-value = 0.1301
#

enter image description here

Hope this helps.

RHertel
  • 23,412
  • 5
  • 38
  • 64
1

Add

 print(summary(y))

before the end of your while loop and watch your computation explode. In any event, repetitively applying Box-Cox makes no sense because you get the ML(-like) estimator of the transformation parameter from the first application. Moreover, why would you expect a power transformation to normalize a uniform distribution?

John

John Fox
  • 284
  • 1
  • 3
  • Thank you! It is very helpful! Totally make sense. I need review my statistics book. – HQ L Jul 14 '15 at 19:38