-1

I've been learning R in R Studio and have been working on simple prediction modeling.

I receive the following error:

Invalid argument: 'sim' & 'obs' doesn't have the same length !

when I run this line of code:

rmse(testingbabydata$weight, predictedWeight)

The dataset linked here contains 1000 rows and the global environment pane shows that my testing data and my training data have "500 obs. of 2 variables" each.

The library hydroGOF should already be loaded properly too.

This is my code snippet wherein I attempt to predict a baby's weight based on the length of the pregnancy in weeks:

ncbabydata=read.csv("nc.csv",header=TRUE,stringsAsFactors = FALSE`)
trainingbabydata=ncbabydata[seq(1,nrow(ncbabydata),2),c("weeks","weight")]
testingbabydata=ncbabydata[seq(2,nrow(ncbabydata),2),c("weeks","weight")]
model = train(weight ~.,trainingbabydata,method="rf")
predictedWeight=predict(model,testingbabydata)
rmse(testingbabydata$weight, predictedWeight)

Thank you for your time! (I did attempt to google this error message first but found no suitable source that I could understand relatively easily.)

Cyrus Mohammadian
  • 4,982
  • 6
  • 33
  • 62

1 Answers1

1

Your two vectors are, in fact, not the same length:

> length(predictedWeight)
[1] 498
> length(testingbabydata$weight)
[1] 500

The reason for this is that some of your features are NA, and your prediction is simply omitting these rows. Handling missing data in models is a complex topic, but since it's only two rows out of 500, you can just remove them for now and continue your learning:

testingbabydata<-testingbabydata[complete.cases(testingbabydata),]

and you can then calculate your RMSE (which you can also do directly, without a helper):

> sqrt(mean((testingbabydata$weight-predictedWeight)^2))
[1] 1.025823

and you can compare it to a model which always predicts the mean value:

> sqrt(mean((testingbabydata$weight-mean(testingbabydata$weight))^2))
[1] 1.460638