5

The dataset can be downloaded from http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

Getting the following error:

formula(formula, data = data) : 
  invalid model formula in ExtractVars

Using the following code:

install.packages("rpart")
library("rpart")

# you'll need to change the following from windows to work on a linux box:
mydata <- read.csv(file="c:/Users/md7968/downloads/winequality-red.csv")

# grow tree 
fit <- rpart(YouSweetBoy ~ "residual sugar" + "citric acid", method = "class", data = mydata

Mind you I've changed the delimiters in the CSV file to commas.

perhaps it's not reading the data correctly. Forgive me, I'm new to R and not a very good programmer.

dgene54
  • 81
  • 1
  • 3
  • 7

3 Answers3

12

Look at names(mydata). When you create a data.frame, read.table() will turn "bad" column names into good ones. You can't (well, shouldn't) have a space in a column name so R changes spaces to periods. Plus, you should never have quoted strings in a formula. Try

fit <- rpart(quality ~ residual.sugar + citric.acid, method = "class", data = mydata)

(I have no idea what "YouSweetBoy" was supposed to be since that wasn't in the dataset so i changed it to "quality").

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you so much! However, I ran the code and got the following: Error in eval(expr, envir, enclos) : object 'quality' not found – dgene54 Jan 15 '15 at 16:02
  • 1
    The data set you linked to had a "quality" column. Replace that with whatever you want your response variable to be. – MrFlick Jan 15 '15 at 16:18
  • There was a quality column in the sheet, however, I still get the error. – dgene54 Jan 15 '15 at 19:04
  • 2
    I don't know what to tell you. This works for me `mydata<-read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=";", header=TRUE);rpart(quality ~ residual.sugar + citric.acid, method = "class", data = mydata);` tested with R version 3.1.1 and rpart_4.1-8. – MrFlick Jan 15 '15 at 19:28
  • Thank you so much, that worked! It was probably just my version of CSV messing up. You're a godsend! – dgene54 Jan 15 '15 at 20:42
  • Yes, but sometimes you want spaces in the names, usually when the script output is some document and you want all the graphs to have nice labels. e.g. using `knitr`... @Huiyan Wan's solution works well. – Jiří May 22 '21 at 08:39
3

Removing the space in independent variable names and taking off the quotes made it to work.

Instead of "residual sugar", use residual_sugar

0

Alternatively, wrap your variable names with ``

So

`residual sugar`

This should work:

fit <- rpart(quality ~ `residual sugar` + `citric acid`, method = "class", data = mydata)
Huiyan Wan
  • 1,951
  • 1
  • 8
  • 7