The parameter minCases of the function C5.0Control in the C50 R package is defined as:
an integer for the smallest number of samples that must be put in at least two of the splits.
How is this implemented? I assume that split in this context refers to the nodes resulting from the split operation. minCases does not seem to represent the smallest number of cases that must be put in at least one node, as I would have expected.
I have tried to find the implementation in the C source code. The variable minCases seems to be defined in extern.h in line 33:
extern CaseCount MINITEMS, LEAFRATIO;
It is used, for instance, in prune.c, lines 249 and 250:
if (BranchCases[v] < MINITEMS) { ForEach(i, Bp, Ep) { SmallBranches[Class(Case[i])] += Weight(Case[i]); }
What does minCases really do?