2

I have a huge dataset. I computed the multinomial regression by multinom in nnet package.

mylogit<- multinom(to ~ RealAge, mydata)

It takes 10 minutes. But when I use summary function to compute the coefficient it takes more than 1 day!!! This is the code I used:

output <- summary(mylogit) 

Coef<-t(as.matrix(output$coefficients))

I was wondering if anybody know how can I compute this part of the code by parallel processing in R?

this is a small sample of data:

mydata:
to  RealAge
513 59.608
513 84.18
0   85.23
119 74.764
116 65.356
0   89.03
513 92.117
69  70.243
253 88.482
88  64.23
513 64
4   84.03
65  65.246
69  81.235
513 87.663
513 81.21
17  75.235
117 49.112
69  59.019
20  90.03
Stedy
  • 7,359
  • 14
  • 57
  • 77

1 Answers1

1

If you just want the coefficients, use only the coef() method which do less computations.

Example:

mydata <- readr::read_table("to  RealAge
513 59.608
513 84.18
0   85.23
119 74.764
116 65.356
0   89.03
513 92.117
69  70.243
253 88.482
88  64.23
513 64
4   84.03
65  65.246
69  81.235
513 87.663
513 81.21
17  75.235
117 49.112
69  59.019
20  90.03")[rep(1:20, 3000), ]

mylogit <- nnet::multinom(to ~ RealAge, mydata)
system.time(output <- summary(mylogit))          # 6 sec
all.equal(output$coefficients, coef(mylogit))    # TRUE & super fast

If you profile the summary() function, you'll see that the most of the time is taken by the crossprod() function. So, if you really want the output of the summary() function, you could use an optimized math library, such as the MKL provided by Microsoft R Open.

F. Privé
  • 11,423
  • 2
  • 27
  • 78