I use the partykit
package and come across the following error message:
Error in matrix(0, nrow = mi, ncol = nl) :
invalid 'nrow' value (too large or NA)
In addition: Warning message:
In matrix(0, nrow = mi, ncol = nl) :
NAs introduced by coercion to integer range
I used the example given in this article, which compares packages and their handling with a lot of categories.
The problem is, that the used splitting variable has too many categories. Within the mob()
functions a matrix with all possible splits is created. This matrix alone is of size p * (2^(p-1)-1)
, where p is the number of categories of the splitting variable.
Depending on the used system resources (RAM etc.) the given error occurs for different numbers of p.
The article suggest the use of the Gini criterion. I think with the intention of the partykit package, the Gini criterion can not be used, because I do not have a classification problem with a target variable, but a model specification problem.
My question therefore: is there is a way to find the split for such cases or a way to reduce the number of splits to check?