Why does R tell me I have NAs in my prob distribution when I call the sample() function?

Question

I am running into an issue when I try to run the below function. The exact error I am getting is : Error in sample.int(length(x), size, replace, prob) : NA in probability vector.

I use the print(t) line to see where it's stopping, and it seems to be around the 10th iteration, at which point, I see if there are any NA values in my w probability vector, but there aren't. The minimum value is somewhere on the magnitude of 10e-5.

Does anyone have any ideas what is causing this error? Is it possible that the values in the prob vector are so small that R interprets them as NA?

My call to the function:

boosted_prediction <- boost_LS(x_train, y_train, x_test, 1500)

My function:

boost_LS <- function (x, y, x_test, ts) {
  n <- nrow(x)
  w <- matrix(rep(1 / n, n), n, 1)
  boost_pred <- matrix(0, nrow(x_test), 1)
  for (t in 1:ts) {
    bootstrap_index <- sample(1:n, size = n, replace = TRUE, prob = w)
    bootstrap_x <- as.matrix(x[bootstrap_index, ])
    bootstrap_y <- as.matrix(y[bootstrap_index])
    ls_w <- solve(t(bootstrap_x) %*% bootstrap_x) %*% t(bootstrap_x) %*% bootstrap_y
    pred <- sign(bootstrap_x %*% ls_w)
    e_t <- sum(w[bootstrap_y != pred])
    a_t <- 0.5 * log((1 - e_t) / e_t)
    w_hat <- matrix(0, n, 1)
    for (i in 1:n) {
      w_hat[i, 1] <- w[i, 1] * exp(-a_t * bootstrap_y[i, 1] * pred[i, 1])
    }
    w <- w_hat / sum(w_hat)
    boost_pred <- boost_pred + (a_t * (x_test %*% ls_w))
  #  print(t)
  }
  return(sign(boost_pred))
}

EDIT: So, I've found that my error rate (e_t) goes to 0 after 6-7 iterations, so my new weight the probability vector (a_t) is going to Inf, which is messing up my probability vector...

This is less of a debugging question now and more of a logic question with the AdaBoost algorithm. If anyone has any hints, would be greatly appreciated!

Why does R tell me I have NAs in my prob distribution when I call the sample() function?

0 Answers0