0

for a project I needed to merge an Excel and an SPSS file with R.
Not sure if this was my best idea. I got the merge done, however in the process I had to use the attribute(col)$label as the name in order to work.

My final merged data.frame has thus way to long column names including special characters (e.g. :). Here are the first few examples how the colnames of the current merged df look like

colnames(combined_retro)
  [1] "Zeitpunkt zu dem das Interview begonnen hat (Europe/Berlin)"                                                             
  [2] "Studiencode: [01]"                                                                                                       
  [3] "Format"                                                                                                                  
  [4] "Geschlecht"                                                                                                              
  [5] "Alter (direkt): Ich bin   ... Jahre"                                                                                     
  [6] "Staatsangehörigkeit"   

So I created another data.frame Naming_Back in which I have two columns: Name Label

Naming_Back
   Name     Label 
 1 CASE     Interview-Nummer (fortlaufend)                            
 2 SERIAL   Seriennummer (sofern verwendet)                           
 3 REF      Referenz (sofern im Link angegeben)                       
 4 QUESTNNR Fragebogen, der im Interview verwendet wurde              
 5 MODE     Interview-Modus  

So now I would like to iterate over the Columns of my merged data.frame combined_retro and check if the current name of the column (e.g. "Zeitpunkt zu dem das Interview begonnen hat (Europe/Berlin)" is available in the Label column of the second (Naming_Back) data.frame. If it is I would like to exchange the current column name with the one provided by the Name column.

My Current approach is the following loop:

for(i in 1:ncol(retro)) {       # for-loop over columns
  new_name_buffer <- Naming_Back %>% 
    filter(Label == colnames(retro[ , i])) 
  if(!(is_empty(new_name_buffer$Name))){
    colnames(retro[ , i]) <- new_name_buffer$Name
    print(colnames(retro[ , i]))
    print(new_name_buffer$Name)
  }
}

Examples for the print commands from the loop

[1] "Geschlecht"
[1] "SD02"
[1] "Staatsangehörigkeit"
[1] "SD04"
[1] "Staatsangehörigkeit: Anders"
[1] "SD04_04"

So obviously the problem is this line colnames(retro[ , i]) <- new_name_buffer$Name as it does not change the column name. Has anyone a quick idea how to fix it?

EDIT: Found a solution, by creating a character vector and stepwise filling it with either the abbreviated name when available or the old name if not

new_col_names <- c()
for(i in 1:ncol(retro)) {       # for-loop over columns
  new_name_buffer <- Naming_Back %>% 
    filter(Label == colnames(retro[ , i])) 
  if(!(is_empty(new_name_buffer$Name))){
    colnames(retro[ , i]) <- new_name_buffer$Name
    new_col_names <- c(new_col_names, new_name_buffer$Name)
  }
  else{
    new_col_names <- c(new_col_names, colnames(retro[ , i]))
  }
}
colnames(retro) <- new_col_names

EDIT 2: Just found an alternative solution to overwrite a column name while iterating over the columns with a for loop, you can just do names(dataframe)[index] and then just assign a new value with <- "newColName"

  for(i in 1:ncol(retro)) {       # for-loop over columns
  new_name_buffer <- Naming_Back %>% 
    filter(Label == colnames(retro[ , i])) 
  if(!(is_empty(new_name_buffer$Name))){
    names(retro)[i] <- new_name_buffer$Name
    print(colnames(retro[ , i]))
    print(new_name_buffer$Name)
  }
}
Björn
  • 1,610
  • 2
  • 17
  • 37
  • 1
    I would have thought the assignment would have been as `colnames(retro)[i] <-`. Why? ..Because `retro[ , i]` is not a dataframe; it's just an atomic vector, so `colnames<-` shouldn't be able to work on it. – IRTFM Jun 14 '21 at 15:20
  • @IRTFM you are right, a work around would be `names(retro)[i] <-` – Björn Jun 23 '21 at 08:26
  • If that is it then you should write an answer with an explanation. (I'm not planning on doing so.) – IRTFM Jun 23 '21 at 17:33
  • Thanks for the suggestion I did answer it. Hope the answer makes it clear, and might be helpful to readers of this in the future. – Björn Jun 24 '21 at 14:01

1 Answers1

1

The problem (as correctly identified in the comments to the OP by @IRTFM) with the original code was, that the assignment of the new column name: colnames(retro[ , i]) <- new_name_buffer$Name was not working properly, as colnames does not work on an atomic vector.

I found work-around to overwrite a column, name while iterating over the columns of a data.frame with a for loop. One can just do call names(dataframe)[index] and then just assign a new column name with <- "newColName"in my example the important line would thus look like this:

Assigning a new column name

names(retro)[i] <- new_name_buffer$Name

The Complete solution with the for loop

for(i in 1:ncol(retro)) {       # for-loop over columns
      # Check if a row with the label is available in the Naming_Back dataframe
      new_name_buffer <- Naming_Back %>% 
         filter(Label == colnames(retro[ , I])) 

  # When a Name matching the label is found, replace the old name 
  if(!(is_empty(new_name_buffer$Name))){
    names(retro)[i] <- new_name_buffer$Name
  }
}
Björn
  • 1,610
  • 2
  • 17
  • 37