-1

I have some personal dataset. So I split it into variable to predict and predictors. Following is the syntax:

library(Cubist)
str(A)
'data.frame':   6038 obs. of  3 variables:
 $ ads_return_count : num  7 10 10 4 10 10 10 10 10 9 ...
 $ actual_cpc       : num  0.0678 0.3888 0.2947 0.0179 0.095 ...
 $ is_user_agent_bot: Factor w/ 1 level "False": 1 1 1 1 1 1 1 1 1 1 ...
cubist(A[,c("ads_return_count","is_user_agent_bot")],A[,"actual_cpc"])

And I am getting the following error

cubist code called exit with value 1
Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds

Is there something I am missing ?

Spacedman
  • 92,590
  • 12
  • 140
  • 224
Sourav Sarkar
  • 406
  • 1
  • 5
  • 14

2 Answers2

1

Simulate some data to make a reproducible example:

A=data.frame(ads_return_count=sample(100,10,TRUE), actual_cpc=runif(100), is_user_agent_bot=factor(rep("False",100)))

cubist(A[,c("ads_return_count","is_user_agent_bot")],A[,"actual_cpc"])
cubist code called exit with value 1
Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds

Great, now we're on the same page.

What bothers me is that the second argument, the outcome, is all "False". I'm not sure a model with only one outcome is meaningful. Let's try something with two outcomes:

> A2=data.frame(ads_return_count=sample(100,10,TRUE), actual_cpc=runif(100), is_user_agent_bot=sample(c("True","False"),100,TRUE))
> cubist(A2[,c("ads_return_count","is_user_agent_bot")],A2[,"actual_cpc"])

Call:
cubist.default(x = A2[, c("ads_return_count", "is_user_agent_bot")], y =
 A2[, "actual_cpc"])

Number of samples: 100 
Number of predictors: 2 

Number of committees: 1 
Number of rules: 1 

I would say this was an uninformative error message from cubist caused by having a single outcome possibility.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • I actually had a lot of columns in my data frame. So may be this column is itself causing the problem. Also When I use all my columns including the possibly single valued one I still get the error - which does not happen with other packages like rpart and etc. I will test and add again then :) . However in my dataset there are many categorical variables with many levels is that a problem for Cubist ? – Sourav Sarkar Jun 23 '15 at 14:13
  • You were right , I just removed the columns with single levels and its working fine. But this behaviour is not expected right ? – Sourav Sarkar Jun 23 '15 at 14:41
  • It would be nicer if the package went "Hey, how can I predict an outcome when there's only one outcome?" - suggest you contact the maintainer and make a suggestion. – Spacedman Jun 23 '15 at 15:54
0

I had the same issue with mine except it turned out to be a level name was a missing value "". Replacing those levels with text did the trick.

Seems there is a similar issue with c5.0 decision tree C5.0 decision tree - c50 code called exit with value 1