0

I should first admit that I really found it difficult to come up with a proper title to the complex issue I am facing.

I have the following data:

        configuration_id     TARGET_CLASS                 UniqueIdentifier  BranchCoverage  Total_Branches  Size    Length  Generations Statements_Executed CoverageTimeline_T1 CoverageTimeline_T2 CoverageTimeline_T3
        ar_statement         com.browsersoft.aacs.User  NA                67559dfd        1               60      46        108          NA                 108                 0.8158776539          0.8381375035
        ar_statement         com.browsersoft.aacs.User  efe4cbdc            1                 60                44    103       240          1087446              0.7525773196        0.7540513682        0.7661337337
        ar_statement         com.browsersoft.aacs.User  NA                aac8afa6        1               60      43        104          NA                 177                 0.765031271         0.8062749834
        ar_statement         com.browsersoft.aacs.User  8567c4bd            1                 60                45    105       388          NA                 0.8680720145          0.9386218251        0.9484536082
        ar_statement         com.browsersoft.aacs.User  94e45912            1                 60                43    101       118          NA                 0.8767466262          0.9471901622        0.9690721649

As you can see there are NAs in the UniqueIdentifier column. The NA pushed the values in the same row to the right side; the correct value is in the right column. What I want is to remove the NA and replace it with the next column value like:

    configuration_id     TARGET_CLASS                 UniqueIdentifier  BranchCoverage  Total_Branches  Size    Length  Generations Statements_Executed CoverageTimeline_T1 CoverageTimeline_T2 CoverageTimeline_T3
    ar_statement         com.browsersoft.aacs.User  67559dfd            1                 60                46      108     108          NA                 0.8158776539          0.8381375035
    ar_statement         com.browsersoft.aacs.User  efe4cbdc            1                 60                44      103     240          1087446              0.7525773196        0.7540513682        0.7661337337
    ar_statement         com.browsersoft.aacs.User  aac8afa6            1                 60                43      104     177          NA                 0.765031271         0.8062749834
    ar_statement         com.browsersoft.aacs.User  8567c4bd            1                 60                45      105     388          NA                 0.8680720145          0.9386218251        0.9484536082
    ar_statement         com.browsersoft.aacs.User  94e45912            1                 60                43      101     118          NA                 0.8767466262          0.9471901622        0.9690721649

To make it more clear, for those rows where UniqueIdentifier is NA, then replace the value of each column with value in the next column (it's like pushing the values back).

I hope my question is clear.

How can I do that?

Adam Amin
  • 1,406
  • 2
  • 11
  • 23
  • 1
    Seems like [Shifting non-NA cells to the left](https://stackoverflow.com/questions/23285215/shifting-non-na-cells-to-the-left); [How to move cells with a value row-wise to the left in a dataframe](https://stackoverflow.com/questions/26651606/how-to-move-cells-with-a-value-row-wise-to-the-left-in-a-dataframe) – Henrik Jul 23 '20 at 11:51

1 Answers1

1

I think you are looking for

data$UniqueIdentifier <- dplyr::coalesce(data$UniqueIdentifier, data$BranchCoverage)

Or using base R:

data$UniqueIdentifier <- ifelse(is.na(data$UniqueIdentifier), data$BranchCoverage, data$UniqueIdentifier) 

edit: Your first data is a bit hard to understand, i couldn't see if it was only BranchCoverage that was changed, or every other value in the line. If every value got pushed to the right, maybe you should check the way you are reading your data. But i think you can solve it like this:

for (i in 1:nrow(data2)){
  if(is.na(data2$UniqueIdentifier[i])){
    data2[i, 3:ncol(data2)] = c(data2[i, 4:ncol(data2)], NA)
  }   
}

This is kind of an ugly solution, but it should work.

If it was only BranchCoverage and you want to replace it all the values for one, you could do data$BranchCoverage <- 1.

Also, thanks to CPak for the comment.