I am working with some large datasets that contain special characters in their column names. The column names look something like: "@c_age1619_da * ((df.age >= 16) & (df.age <= 19))" or "sovtoll_available == False". What would be the best way to work with these names? Should I keep the names as they are or rename them to more R-friendly names? When I call them in cases like df$value, R mistakenly interprets the column name as a function!
-
2I would recommend removing them or renaming them. Perhaps there is also a change to your data import process that can be done. If you are unfamiliar with `r`, the code to do each is `names(yourdata) <- NULL` or `names(df1) <- c("col1", "col2", "col3")` – jpsmith Dec 09 '21 at 01:54
1 Answers
The only advantage to keeping the non-standard names is if you want to use those as labels in a plot or table or something. But it will make it very hard to work with the data, and those names could be reintroduced as labels later. You can use non-standard names by putting them in backticks, e.g.,
df$`@c_age1619_da`
Some editors (like RStudio) will correctly auto-complete these non-standard names, making them somewhat easier to work with, but still not as nice as standard names.
Renaming them to standard names is generally better. Many functions that read-in data will do this automatically. You can use the make.names
function to convert the non-standard names to standard names, mostly by replacing any special characters with .
s. Like this:
names(my_data) = make.names(names(my_data))
But generally the best is to make meaningful names manually. sovtoll_available....False
isn't very friendly name either, compared to something like sovtoll_unavailable
.

- 136,190
- 20
- 167
- 294
-
Thank you for your comprehensive explanation. I have one more question. Does `clean_names` function from janitor library do similar changes to the names as `make.names` do? – Nazanin Dec 09 '21 at 09:56
-
1I've never used it before, but looking at the documentation it sounds similar. Sounds like it uses `_` rather than `.`. Being a newer function in a well-maintained package, it might do a nicer job overall. – Gregor Thomas Dec 09 '21 at 13:43