2

i have a datasets column names looking like that

state.abb, state.area, state.division, state.region

i want to change the names of the columns and delete the "state." part to leave only "abb", "area","division", and "region". i wrote this code using a loop over the df columns using substr func but it doesn't work nor give errors. what's wrong with it please ?


    for(e in 1:ncol(df)){
      colnames(df[e])<-substring(colnames(df[e]),7)
    }

Phil
  • 7,287
  • 3
  • 36
  • 66
  • 2
    I think @akrun's answer should be the accepted one - they provided the solution more fully and first, I only added an answer to provide so explanation and an alternative (there's more than one way to skin a cat) – rg255 May 02 '20 at 22:12
  • hey i was trying to accept both but seems it doesn't work. i re accepted yours because it was the one that put me on the way to explore the difference between colnames(df[1]) and colnames(df)[1]. you have all my gratitude thanks ! – Houssam Baiz May 02 '20 at 22:13

2 Answers2

4

Here, we can change the colnames(df[e]) to colnames(df)[e]

for(e in seq_along(df)){
     colnames(df)[e] <- substring(colnames(df)[e],7)
}

substring is vectorized so we could directly do this without any for loop

colnames(df) <- substring(colnames(df), 7)

Also, if we are removing the prefix including the ., a generalized option assuming that the prefix can be of any length is sub

colnames(df) <- sub(".*\\.", "", colnames(df))

An an example,

data(mtcars)
colnames(mtcars[1]) <- "hello"
colnames(mtcars[1])
#[1] "mpg" # no change
colnames(mtcars)[1] <- "hello"
colnames(mtcars[1])
#[1] "hello" # changed
akrun
  • 874,273
  • 37
  • 540
  • 662
4

As an alternative solution, you could use gsub() to replace all the "state." with nothing (""), here showing that with just a vector:

gsub("state.", "", c("state.abb", "state.area", "state.division", "state.region"))

To replace the colnames names:

colnames(df) <- gsub("state.", "", colnames(df))

As a bonus, imagine you want to replace a word or string that occurs in some but not all of your columns. Taking the built in iris dataset as an example, you could replace "Petal" with "P" for the columns where "Petal" is in the column name with the exact same approach:

colnames(iris) <- gsub("Petal", "P", colnames(iris))

I wouldn't bother with a for loop for this job, it's far easier to use a vectorised approach. But to explain your error, when you did colnames(df[1]) you were returning the column name of a single column dataframe that you had isolated from your main dataframe, rather than handling the main dataframe itself. For example, iris[1] returns a dataframe with one column - see str(iris[1]) - so colnames(iris[1]) returns the column name of that isolate. A slight change instead allows you to return (and then change) the 1st element of the vector of column names for iris: colnames(iris)[1].

rg255
  • 4,119
  • 3
  • 22
  • 40
  • i was about to go crazy to find the difference between colnames(df[1]) and colnames(df)[1]. you have all my gratitude man (or woman whoever you are)! Thanks – Houssam Baiz May 02 '20 at 22:09