3

A couple questions for the rpart and party experts.

1) I am trying to understand the difference of the control parameter "minbucket" in rpart and party. Is it correct that minbucket in rpart is unweighted (even if weights are provided to fit the tree)?

2) Can anyone briefly describe how the weights are used in the rpart algorithm? I tried to download and review the source code, but I couldn't make much sense of it being a newbie. rpart calls a C function (C_rpart), which seems to be the main part of rpart, but I couldn't find more information about it.

Thanks so much in advance.

Lluís Ramon
  • 576
  • 4
  • 7
AriesV
  • 31
  • 4

1 Answers1

1

The weights parameter in rpart (and in most other machine learning algorithms) can be considered to be exactly equivalent to duplicating those training items that many times. A weight of 5 is the same as having that line repeated 5 times. You can explicitly create this using some simple code, provided that your data set is small enough:

data[rep(1:nrow(data),times=data$weights),] 
Craig
  • 4,492
  • 2
  • 19
  • 22
  • 2
    I have the same question as the original poster, but I don't think this answer checks out. My data has a minimum weight of 500. When I set minbucket to 500, I get no tree at all, so it can't be working in terms of weight. When I set it back to 8 (near the default) I get a reasonable tree. – Soson May 25 '18 at 19:17