4

I am trying to run the naiveBayes classifier from the R package e1071. I am running into an issue where the time it takes to predict takes longer than the time it takes to train, by a factor of ~300.

I was wondering if anyone else has observed this behavior and, if so, if you have any suggestions on how to improve it.

This issue appears only in some instances. Below, I have code that trains and predicts the NB classifier on the Iris dataset. Here the training and prediction times match up quite closely (prediction takes 10x longer instead of 300x longer). The only other trace of this issue that I could find online is here. In that instance, the answer was to make sure that categorical variables are formatted as factors. I have done this, but still don't see any improvement.

I have played around with the sample size N and the problem seems to be lessened as N decreases. Perhaps this is intended behavior of the algorithm? Decreasing N by a factor of 10 causes the prediction to be only 150x slower, but increasing by a factor of 10 yields a similar slowdown of 300x. These numbers seem crazy to me, especially because I've used this algorithm in the past on datasets with ~300,000 examples and found it to be quite fast. Something seems fishy but I can't figure out what.

I'm using R version 3.3.1 on Linux. The e1071 package is up-to-date (2015 release).

The code below should be reproducible on any machine. FYI my machine timed the Iris classification at 0.003s, the Iris prediction at 0.032s, the simulated data classification at 0.045s, and the resulting prediction at 15.205s. If you get different numbers than these, please let me know as it could be some issue on my local machine.

# Remove everything from the environment and clear out memory
rm(list = ls())
gc()

# Load required packages and datasets
require(e1071)
data(iris)

# Custom function: tic/toc function to time the execution
tic <- function(gcFirst = TRUE, type=c("elapsed", "user.self", "sys.self"))
{
  type <- match.arg(type)
  assign(".type", type, envir=baseenv())
  if(gcFirst) gc(FALSE)
  tic <- proc.time()[type]         
  assign(".tic", tic, envir=baseenv())
  invisible(tic)
}

toc <- function()
{
  type <- get(".type", envir=baseenv())
  toc <- proc.time()[type]
  tic <- get(".tic", envir=baseenv())
  print(toc - tic)
  invisible(toc)
}

# set seed for reproducibility
set.seed(12345)

#---------------------------------
# 1. Naive Bayes on Iris data
#---------------------------------
tic()
model.nb.iris <- naiveBayes(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,data=iris)
toc()
tic()
pred.nb.iris <- predict(model.nb.iris, iris, type="raw")
toc()

#---------------------------------
# 2. Simulate data and reproduce NB error
#---------------------------------
# Hyperparameters
L <- 5   # no. of locations
N <- 1e4*L

# Data
married        <- 1*(runif(N,0.0,1.0)>.45)
kids           <- 1*(runif(N,0.0,1.0)<.22)
birthloc       <- sample(1:L,N,TRUE)
major          <- 1*(runif(N,0.0,1.0)>.4)
exper          <- 15+4*rnorm(N)
exper[exper<0] <- 0
migShifter     <- 2*runif(N,0.0,1.0)-1
occShifter     <- 2*runif(N,0.0,1.0)-1
X <- data.frame(rep.int(1,N),birthloc,migShifter,occShifter,major,married,kids,exper,exper^2,exper^3)
colnames(X)[1] <- "constant"
rm(married)
rm(kids)
rm(birthloc)
rm(major)
rm(exper)
rm(occShifter)

# Parameters and errors
Gamma <- 15*matrix(runif(7*L), nrow=7, ncol=L)
eps <- matrix(rnorm(N*L, 0, 1), nrow=N, ncol=L)

# Deterministic portion of probabilities
u <- matrix(rep.int(0,N*L), nrow=N, ncol=L)
for (l in 1:L) {
    u[ ,l] = (X$birthloc==l)*Gamma[1,l] +
    X$major*Gamma[2,l]         + X$married*Gamma[3,l]              
    X$kids*Gamma[4,l]          + X$exper*Gamma[5,l]              
    X$occShifter*Gamma[6,l]    + X$migShifter*X$married*Gamma[7,l]
    eps[ ,l]
}

choice <- apply(u, 1, which.max)

# Add choice to data frame
dat <- cbind(choice,X)

# factorize categorical variables for estimation
dat$major      <- as.factor(dat$major)
dat$married    <- as.factor(dat$married)
dat$kids       <- as.factor(dat$kids)
dat$birthloc   <- as.factor(dat$birthloc)
dat$choice     <- as.factor(dat$choice)

tic()
model.nb <- naiveBayes(choice~birthloc+major+married+kids+exper+occShifter+migShifter,data=dat,laplace=3)
toc()
tic()
pred.nb <- predict(model.nb, dat, type="raw")
toc()
Prradep
  • 5,506
  • 5
  • 43
  • 84
Tyler R.
  • 461
  • 6
  • 15
  • If you don't need the conditional a-posterior probabilities for each class, it's a little faster (~11 secs on my machine as opposed to ~15 secs). – Sandipan Dey Nov 30 '16 at 11:19
  • Thanks, @sandipan! I actually do need these, but I appreciate you running the code on your machine! – Tyler R. Nov 30 '16 at 11:34
  • 1
    **Update:** According to the package maintainer, these computation times are not surprising. Everything appears to be functioning as intended. – Tyler R. Nov 30 '16 at 18:25
  • 1
    okay, but being a generative model should not it take more time in training than prediction? it's a little counter-intuitive. – Sandipan Dey Nov 30 '16 at 18:34
  • @sandipan I think the reasoning is that the prediction step requires searching through each example and finding the set of marginal posterior probabilities that match the given combination of features for that example. This is apparently where the slowness is coming from. – Tyler R. Dec 04 '16 at 17:18
  • I thought in order to classify an example x=x_1x_2..x_n, it only requires computing arg max_j prod_i P(x_i|C_j) where C_j s are different classes and x_i s are the features (with conditional independence and Bayes theorem) for prediction where P(x_i|C_j) values have already been computed during training. – Sandipan Dey Dec 04 '16 at 17:39
  • @sandipan The training step is fast and is simply a set of conditional means, as you mentioned. But the prediction step is essentially a many-to-one merge between two data frames (the test set and the small set of conditional probabilities). My suspicion is that this merging is done via loops rather than other tried-and-true database joining algorithms. If I have time in the next little while, I'll try my hand at manually merging at the prediction step to see if I can get a speedup. – Tyler R. Dec 05 '16 at 18:21
  • also using some dictionaries (hash-maps) to store the MLE parameters learnt at the training time will speedup the process, because it will then be just a matter of fast lookups. – Sandipan Dey Dec 05 '16 at 18:24
  • @SandipanDey: can you explain how can we store this in dictionaries for fast lookup? – Hardik Gupta Aug 16 '17 at 09:21
  • 1
    You can check out other Naive Bayes implementation :) https://cran.r-project.org/web/packages/naivebayes/index.html – Michal Majka Mar 13 '19 at 21:01

1 Answers1

3

I ran into the same problem. I needed to run naive bayes and predict a lot of times (1000's of times) on some big matrices (10000 rows, 1000-2000 cols). Since I had some time, I decided to implement my own implementation of naive bayes to make it a little faster:

https://cran.r-project.org/web/packages/fastNaiveBayes/index.html

I made some work out of this and created a package out of it: https://cran.r-project.org/web/packages/fastNaiveBayes/index.html. It is now around 330 times faster using a Bernoulli event model. Moreover, it implements a multinomial event model (even a bit faster) and a Gaussian model (slightly faster). Finally, a mixed model where it's possible to use different event models for different columns and combine them!

The reason e1071 is so slow in the predict function, is cause they use essentially a double for loop. There was already a pull request open from around beginning 2017 that at least vectorized one of these, but was not accepted yet.

Martin Skogholt
  • 126
  • 1
  • 4