0

The code below is about oversampling houses with over 10 rooms, may I ask what does prob = ifelse(housing.df$ROOMS>10, 0.9, 0.01) mean? Thanks a lot.

s <- sample(row.names(housing.df), 5, pro = ifelse(housing.df$ROOMS>10, 0.9, 0.01))
housing.df[s.]
Lea DM
  • 1

1 Answers1

0

I imagine the purpose of this ccode is to first check to see if a given house in the data set has ten rooms. If that is the case then it gets a probability of 90%, otherwise it gets a probability of 10%

sample with sample from the given house names using this associated probability thus favouring those houses with more than ten rooms when it samples. This creates your over sample.

Is this what you mean?

MDEWITT
  • 2,338
  • 2
  • 12
  • 23