I have a large data-set comprised of many very specific variables. I am looking for a way to simplify the column names using a quick method rather than manually changing over 1000 columns.
total.population.2020 <- c("1","2" )
total.population.2020.both.sexes <- c("3", "4")
total.population.2020.sexes.males.14.to.16.years <- c("7", "9")
total.income.2020 <- c("55", "40")
total.income.2020.25.to.30.years <- c("80", "90")
df <- data.frame(total.population.2020, total.population.2020.both.sexes, total.population.2020.sexes.males.14.to.16.years, total.income.2020, total.income.2020.25.to.30.years)
I ran the clean_names function from Janitor first because using gsub/abbreviate on the original df would wipe out the whole column name leaving it empty rather than simplifying it.
library(janitor)
df <- clean_names(df)
Then I would run gsub/abbreviate. However I am running into the problem that the abbreviations are still very long (10 characters+) and missing numbers that I would count as important identifiers (ex. Age 20 to 25). This requires me to still manually change the column names.
names(df) <- abbreviate(gsub("_", " ", names(df)))
df
Is there a simpler method? How would you approach a large data-set with long column names?
Expected Output:
Old | New |
---|---|
total.population.2020 |
tp |
total.population.2020.both.sexes |
tpb |
total.population.2020.sexes.males.14.to.16.years |
tpm14_16 |
total.income.2020 |
ti |
total.income.2020.25.to.30.years |
ti25_30 |