2

I tried to speed the below code but without any success.

I read about Rfast package but I also fail in implementing that package.

Is there any way to optimise the following code in R?

RI<-function(y,x,a,mu,R=500,t=500){
  x <- as.matrix(x)
  dm <- dim(x)
  n <- dm[1]  
  bias1 <- bias2 <- bias3 <- numeric(t)
  b1 <- b2<- b3 <- numeric(R) 
  ### Outliers in Y ######
  for (j in 1:t) {
  for (i in 1:R) {
    id <- sample(n, a * n)
    z <- y
    z[id] <- rnorm(id, mu) 
    b1[i] <- var(coef(lm(z ~., data = as.data.frame(x))))
    b2[i] <- var(coef(rlm(z ~ ., data = data.frame(x), maxit = 2000, method = "MM")))
    b3[i] <- var(coef(rlm(z ~ ., data = data.frame(x), psi = psi.huber,maxit = 300)))
  }
      bias1[j] <- sum(b1)    ;   bias2[j] <- sum(b2);   bias3[j] <- sum(b3)
  }
  bias <- cbind("lm" = bias1,"MM-rlm" = bias2, "H-rlm" = bias3)
  colMeans(bias)
  }
#######################################
p <- 5
n <- 200
x<- matrix(rnorm(n * p), ncol = p)
y<-rnorm(n)
a=0.2
mu <-10
#######################################
RI(y,x,a,mu)
jeza
  • 299
  • 1
  • 4
  • 21
  • 2
    1) You start with `x <- as.matrix(x)` and in your inner loop you call `as.data.frame(x)` then `data.frame(x)` twice though `x` is never changed in the entire function. Get rid of the first instruction, `as.matrix` and call `as.data.frame` just once, before both loops. – Rui Barradas Feb 01 '21 at 12:31
  • 1
    2) What is `var(coef())`, statistically speaking? – Rui Barradas Feb 01 '21 at 12:36
  • 2
    If speed is important, don't use the formula methods of `lm` and `rlm`. Create the model matrix just once and pass it and the dependent variable to the default methods. However, if you profile the code you should see that most of the time is spent with the iterated re-weighted least squares step. That's your bottleneck and you can't optimize it with reasonable effort. It would be easier to parallelize the loop. – Roland Feb 01 '21 at 12:39
  • 2
    We are looping 250K times, it will be slow. Maybe consider [foreach package](https://cran.r-project.org/package=foreach)? To extend Rui's question, what is this function doing statistically? – zx8754 Feb 01 '21 at 12:39
  • @RuiBarradas, It is simply the variance of the model coefficients. – jeza Feb 01 '21 at 12:41
  • 1
    Yes, I can understand that but what I don't see is its meaning. Anyway, even if it had one, you are adding those variances and then returning the means of the sums. Is this what you really want, not the means of the values computed in the inner loop? – Rui Barradas Feb 01 '21 at 13:04
  • @RuiBarradas, still the same problem. – jeza Feb 01 '21 at 13:05
  • @RuiBarradas, yes exactly. – jeza Feb 01 '21 at 13:08
  • I agree with @RuiBarradas' question on the meaning of this. But I'll assume you're sure of what you want. – Pierre Gramme Feb 02 '21 at 10:46
  • One more thing: what you have written can be expressed with only one loop level `for (i=seq(R*t))` if you return `bias1=sum(b1)/t`, etc. It shows more explicitly that you are repeating 250K times the same. Can't you decrease the iterations? – Pierre Gramme Feb 02 '21 at 10:46
  • @PierreGramme, Simply, to check which Regression Method is more robust to the outliers. I will examine many scenarios. – jeza Feb 02 '21 at 11:04

0 Answers0