0

Suppose I have a list of 1500000 states with given zip codes and I want to run my predictor Model (databas) on that list and get the predictions of Area, I did the same by the help of one gentleman and here is my code:

pred <- sapply(1:nrow(first), function(row) { predict(basdata,first[row, ],estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma })
  1. basdata: My Model
  2. first: My new data set for which I am predicting the area.

Now, The issue that i am facing is that the code is taking a long time to predict the values. It iterates over every row and calculates the area. There are 150000 rows in my data set and I would request if anyone can help me optimizing the performance of this code.

Phil
  • 7,287
  • 3
  • 36
  • 66
  • is the predict function not vectorized? what happens when you do `predict(basdata,first[1:10,],estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma`?? does it not give you 10 predictions? – Onyambu Jun 19 '20 at 17:08
  • I have 3 variables to the whole data set, my model BASDATA is based on getting some values from three respective variables. I think if I just try predict as you suggested, I get some vague values as the model doesn't iterates over each row of the whole dataset. – Uttasarga Singh Jun 19 '20 at 17:11
  • Could you possibly explain what first[1:10] would help me achieve? – Uttasarga Singh Jun 19 '20 at 17:12
  • I do not understand your point. just try the code above and try iterating the first 10 rows ie `sapply(1:10, function(row) { predict(basdata,first[row, ],estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma })` and see whether the values are different. – Onyambu Jun 19 '20 at 17:13
  • its `first[1:10,]` not `first[1:10]` take a note of the comma after the 10 – Onyambu Jun 19 '20 at 17:14
  • I am sorry about that. I would apply this and would let you know. Also, YBMA: is the predicted value which I get for each row of zip, state and state_idx. So. there are 150000 rows consisting of just these values, and i have build my model using a different data set, but the same variables i.e. zip, state and state_idx. – Uttasarga Singh Jun 19 '20 at 17:16
  • yes. First try predicting the first 10 rows with the code I gave you up there. and compare the results by using your code for the first 10 rows the way I described. Compare the two results. If they match, that will show you that the predict function is vectorized.(Of which it is supposed to be since that is the aim of R language). – Onyambu Jun 19 '20 at 17:20
  • Hello Sir, Yes! both the code are printing the same values. Predict function is vectorized as you mentioned – Uttasarga Singh Jun 19 '20 at 17:25
  • So, I can use your given code to iterate over the data set with 150000 rows? – Uttasarga Singh Jun 19 '20 at 17:26
  • You do not need to iterate anything. Just do `predict(basdata,first,estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma`. This will give you all the predicted values for your dataframe – Onyambu Jun 19 '20 at 17:32
  • Ok, Sure! I will try this right away. and let you know. – Uttasarga Singh Jun 19 '20 at 17:34
  • That is the solution. If that takes longer then there is no shortcut. Unless you re-invent the wheel – Onyambu Jun 19 '20 at 17:35
  • It is showing that one of my variables loc_st_prov_cd has new levels – Uttasarga Singh Jun 19 '20 at 17:35
  • I remember that one of the state DC was missing from my training set. I trained the model and then I try to run again on my test set and it is showing this error that the variable itself has new levels(model.frame.default) – Uttasarga Singh Jun 19 '20 at 17:41
  • It Worked! Thank you very much. Now, do can you let me know how can i check which value is estimated for which row? – Uttasarga Singh Jun 19 '20 at 19:57

1 Answers1

0

I would like to thank onyambu for providing me the solution as I was just making the predict function more Complex. The following code can be used for iterating over each row of a data set and predict the values using the Model built.

predict(basdata,first,estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma