0

This may be a very simple problem, but I can't seem to get past it. Have column names such as X100.4, X100.-4, X100.-5 so on. I'm trying to run a linear regression but when I do this I get an error

lm<-lm(X986~X241+X243+X280+X282+X987+X143.2+X239.0+X491.61+X350.-4,data=train)
Error in terms.formula(formula, data = data) : 
  invalid model formula in ExtractVars

it works fine without the variable X350.-4, so I'm assuming it's the problem. I tried doing 'X350.-4' and "X350.-4", but this yielded the same error. I also tried doing "" for all of the variables but this also did not work.

halo09876
  • 2,725
  • 12
  • 51
  • 71
  • 1
    I think the `-` characters and not the dots/decimal points are the problems: see `?make.names` for the definition of syntactically valid names in R. – Ben Bolker Dec 04 '13 at 14:42

2 Answers2

4

You can use backticks:

DF <- data.frame(x=1:10, y=rnorm(10))
names(DF)[1] <- "x.-1"

lm(y~`x.-1`, data=DF)

But it would be better to sanitize the names:

names(DF) <- make.names(names(DF))
Roland
  • 127,288
  • 10
  • 191
  • 288
1

The problem is with the minus sign ("-"), not the decimals. So if you really need these column names, either use @Roland's approach, or replace the minus signs with something else:

colnames(data)=gsub(pattern="-",x=colnames(data),replacement="_")

Using make.names(...) is a little dicey because it can generate collisions (multiple columns with the same name). Consider:

DF <- data.frame(y=1:3,x.1=6:8,z=11:13)
colnames(DF)[3] <- "x-1"
DF
  y x.1 x-1
1 1   6  11
2 2   7  12
3 3   8  13

names(DF) <- make.names(names(DF))
DF
  y x.1 x.1
1 1   6  11
2 2   7  12
3 3   8  13

You may need to use:

names(DF) <- make.names(names(DF),unique=T)
DF
  y x.1 x.1.1
1 1   6    11
2 2   7    12
3 3   8    13
jlhoward
  • 58,004
  • 7
  • 97
  • 140