0

I have a train data set which has 700 records. I prepared the model using c5.0 function with this data.

library(C50)
abc_model <- C5.0(abc_train[-5], abc_train$resultval)

I have test data, which has 5000 records. I am using predict function to do the prediction on these 5000 recs.

abc_Test <- read.csv("FullData.csv", quote="")
abc_pred <- predict(abc_model, abc_test)

This is giving me the prediction for ONLY 700 recs, not all 5000.

How to make this predict for all 5000?

When I have the train data size larger than test data size, then the result is fine, I get all data, I am able to combine test data with results and get the output into ".CSV". But when train data size is smaller than test data, all records are not getting predicted.

 x <- data.frame(abc_test, abc_pred)

Any inputs how to overcome this problem? I am not an expert in R. Any suggestions will help me a lot.


Thanks Richard.

Below is my train data, few recs.

Id      Value1  Value2  Country         Result

20835    63       1    United States    yes

3911156  60      12    Romania          no

39321    10       3    United States    no

29425    80      9     Australia        no

Below is my test data, few recs again.

Id        Value1  Value2  Country 

3942587    114     12       United States

3968314    25      13       Sweden

3973205    83      10       Russian Federation

17318      159     9        Russian Federation

I am trying to find the Result value and append this to my test data. But, like i described, I am getting the Result only for 700 records, not all 5000

user20650
  • 24,654
  • 5
  • 56
  • 91
user3473975
  • 1
  • 1
  • 3
  • Welcome to SO. It's always good practice to include some of your data, usually as the output from `dput(yourData)`. This enables us to reproduce your issue to help diagnose the problem. – Rich Scriven Mar 28 '14 at 19:30
  • I am also getting this error 'predict code called exit with value 1' after executing the line abc_pred <- predict(abc_model, abc_test) – user3473975 Mar 28 '14 at 20:01
  • Correction. I have seen that predict function is exiting after processing 634 rows itself with error 'predict code called exit with value 1'. It is not processing 700 rows also. Does any one know when this error might occur? – user3473975 Mar 28 '14 at 20:29

1 Answers1

1

You should try this:

str(abc_train)
str(abc_test)
lapply(abc_train[ names(abc_train) != "Result"] , table)
lapply(abc_train[] , table)

Then you will probably find that some of the levels for some of the variables in abc_test were not in abc_train, so estimates could not be produced. I'm guessing you thought that the numeric values would be handled as though a regression had been done, but that won't happen if those columns are factors in any prediction function and perhaps never depending on the function's behavior. Looking at C50::C5.0.default, it appears there may be no regression option for variables.

IRTFM
  • 258,963
  • 21
  • 364
  • 487