1

While running a scatter plot (ggscatter) the system throws up this error:

Error in [.data.frame(data, , x) : undefined columns selected

The code is below:

mydata<-data.frame("eng_score" = 1:99, "53_target_pre_mover_2_0_model" = 1:99)

library("ggpubr")
ggscatter(mydata,y = "eng_score"  , x = "`53_target_pre_mover_2_0_model`",  
      add = "reg.line", conf.int = TRUE, 
      cor.coef = TRUE, cor.method = "pearson",
      xlab = "Likely to move", ylab = "Engagement score")

Appreciate the help!

Calvin Nunes
  • 6,376
  • 4
  • 20
  • 48
Aditya Dev
  • 23
  • 1
  • 2
  • 7
  • hey welcome to stackoverflow, please provide a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) so we can try out the code and reproduce your error – mischva11 Aug 01 '18 at 11:38
  • Thank you mischva. I have updated the code to show the data frame as well! – Aditya Dev Aug 01 '18 at 11:48

2 Answers2

0

When you are using

mydata<-data.frame("eng_score" = 1:99, "53_target_pre_mover_2_0_model" = 1:99)

R doesn't like colnames beginning with numbers, so if you use head(mydata) you see the name

53_target_pre_mover_2_0_model

got changed to

X53_target_pre_mover_2_0_model

1) you got can change that in the scatterplot:

mydata<-data.frame("eng_score" = 1:99, "53_target_pre_mover_2_0_model" = 1:99)


library(ggpubr)
ggscatter(mydata,y = "eng_score"  , x = "X53_target_pre_mover_2_0_model",  
         add = "reg.line", conf.int = TRUE, 
         cor.coef = TRUE, cor.method = "pearson",
         xlab = "Likely to move", ylab = "Engagement score")

2) you stop producing the X in colnames

you can change your data.frame function, not to check the colnames for numbers with the argument check.names=F:

mydata<-data.frame("eng_score" = 1:99, "53_target_pre_mover_2_0_model" = 1:99, check.names=F)

mischva11
  • 2,811
  • 3
  • 18
  • 34
  • Thank you so much mischva for checking. I tried both these approaches but the same error persists. Also if I try head (mydata), I do not see the X coalesced with the column name. I understand that having a number as the first character of the variable name creates this issue but can we have another workaround this problem as I cannot change the name of the column for now? Thanks again for the help! – Aditya Dev Aug 01 '18 at 13:43
  • @AdityaDev this is very strange, because you should get the x from `data.frame("eng_score" = 1:99, "53_target_pre_mover_2_0_model" = 1:99)`. You could try to provoke this behaviour by adding `check.names=T` ,though. R does this for "protecting" you from errors which are based on number starting variables. At the moment i'm not aware of a solution which works if you keep the col name like it is. The error is based on failing to call the column, so maybe you find another solution, sorry i could not help you – mischva11 Aug 01 '18 at 14:04
0

R does not always like column names with numbers in them. If you remove for example the digits, it works:

mydata<-data.frame("eng_score" = 1:99, "X_target_pre_mover_X_X_model" = 1:99)

library("ggpubr")
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.1.2
ggscatter(mydata,y = "eng_score"  , x = "X_target_pre_mover_X_X_model",  
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "Likely to move", ylab = "Engagement score")
#> `geom_smooth()` using formula 'y ~ x'

Created on 2022-07-11 by the reprex package (v2.0.1)

So I would suggest changing your column names.

Quinten
  • 35,235
  • 5
  • 20
  • 53