1

I am working with some large datasets that contain special characters in their column names. The column names look something like: "@c_age1619_da * ((df.age >= 16) & (df.age <= 19))" or "sovtoll_available == False". What would be the best way to work with these names? Should I keep the names as they are or rename them to more R-friendly names? When I call them in cases like df$value, R mistakenly interprets the column name as a function!

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Nazanin
  • 81
  • 7
  • 2
    I would recommend removing them or renaming them. Perhaps there is also a change to your data import process that can be done. If you are unfamiliar with `r`, the code to do each is `names(yourdata) <- NULL` or `names(df1) <- c("col1", "col2", "col3")` – jpsmith Dec 09 '21 at 01:54

1 Answers1

1

The only advantage to keeping the non-standard names is if you want to use those as labels in a plot or table or something. But it will make it very hard to work with the data, and those names could be reintroduced as labels later. You can use non-standard names by putting them in backticks, e.g.,

df$`@c_age1619_da`

Some editors (like RStudio) will correctly auto-complete these non-standard names, making them somewhat easier to work with, but still not as nice as standard names.

Renaming them to standard names is generally better. Many functions that read-in data will do this automatically. You can use the make.names function to convert the non-standard names to standard names, mostly by replacing any special characters with .s. Like this:

names(my_data) = make.names(names(my_data))

But generally the best is to make meaningful names manually. sovtoll_available....False isn't very friendly name either, compared to something like sovtoll_unavailable.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you for your comprehensive explanation. I have one more question. Does `clean_names` function from janitor library do similar changes to the names as `make.names` do? – Nazanin Dec 09 '21 at 09:56
  • 1
    I've never used it before, but looking at the documentation it sounds similar. Sounds like it uses `_` rather than `.`. Being a newer function in a well-maintained package, it might do a nicer job overall. – Gregor Thomas Dec 09 '21 at 13:43