I want to change the columns names with a loop

Question

i have a datasets column names looking like that

state.abb, state.area, state.division, state.region

i want to change the names of the columns and delete the "state." part to leave only "abb", "area","division", and "region". i wrote this code using a loop over the df columns using substr func but it doesn't work nor give errors. what's wrong with it please ?


    for(e in 1:ncol(df)){
      colnames(df[e])<-substring(colnames(df[e]),7)
    }

I think @akrun's answer should be the accepted one - they provided the solution more fully and first, I only added an answer to provide so explanation and an alternative (there's more than one way to skin a cat) — rg255, May 02 '20 at 22:12
hey i was trying to accept both but seems it doesn't work. i re accepted yours because it was the one that put me on the way to explore the difference between colnames(df[1]) and colnames(df)[1]. you have all my gratitude thanks ! — Houssam Baiz, May 02 '20 at 22:13

akrun · Accepted Answer · 2020-05-02T20:40:46.010

4

Here, we can change the colnames(df[e]) to colnames(df)[e]

for(e in seq_along(df)){
     colnames(df)[e] <- substring(colnames(df)[e],7)
}

substring is vectorized so we could directly do this without any for loop

colnames(df) <- substring(colnames(df), 7)

Also, if we are removing the prefix including the ., a generalized option assuming that the prefix can be of any length is sub

colnames(df) <- sub(".*\\.", "", colnames(df))

An an example,

data(mtcars)
colnames(mtcars[1]) <- "hello"
colnames(mtcars[1])
#[1] "mpg" # no change
colnames(mtcars)[1] <- "hello"
colnames(mtcars[1])
#[1] "hello" # changed

edited May 02 '20 at 20:40

answered May 02 '20 at 20:32

akrun

874,273
37
540
662

Thanks dude. i thought i could accept as many solution as possible but seems not. thanks a lot – Houssam Baiz May 02 '20 at 22:20
@HoussamBaiz it's okay. It's only that i thought something is wrong on my solution which I couldn't find – akrun May 02 '20 at 22:21

rg255 · Answer 2 · 2020-05-02T22:13:35.880

As an alternative solution, you could use gsub() to replace all the "state." with nothing (""), here showing that with just a vector:

gsub("state.", "", c("state.abb", "state.area", "state.division", "state.region"))

To replace the colnames names:

colnames(df) <- gsub("state.", "", colnames(df))

As a bonus, imagine you want to replace a word or string that occurs in some but not all of your columns. Taking the built in iris dataset as an example, you could replace "Petal" with "P" for the columns where "Petal" is in the column name with the exact same approach:

colnames(iris) <- gsub("Petal", "P", colnames(iris))

I wouldn't bother with a for loop for this job, it's far easier to use a vectorised approach. But to explain your error, when you did colnames(df[1]) you were returning the column name of a single column dataframe that you had isolated from your main dataframe, rather than handling the main dataframe itself. For example, iris[1] returns a dataframe with one column - see str(iris[1]) - so colnames(iris[1]) returns the column name of that isolate. A slight change instead allows you to return (and then change) the 1st element of the vector of column names for iris: colnames(iris)[1].

i was about to go crazy to find the difference between colnames(df[1]) and colnames(df)[1]. you have all my gratitude man (or woman whoever you are)! Thanks — Houssam Baiz, May 02 '20 at 22:09

I want to change the columns names with a loop

2 Answers2