0

I have dataframe with two columns:

   names duration
1      J       97
2      G       NA
3      H       53
4      A       23
5      E       NA
6      D       NA
7      C       73
8      F       NA
9      B       37
10     I       67

What I want to do is replace all NA values in duration column with value from names column from the same row. How can I achive that?

TityBoi
  • 399
  • 1
  • 4
  • 11

2 Answers2

4

Data

zz <- "names duration
1      J       97
2      G       NA
3      H       53
4      A       23
5      E       NA
6      D       NA
7      C       73
8      F       NA
9      B       37
10     I       67"

df <- read.table(text = zz, header = TRUE)

Solution with dplyr

library(dplyr)

df_new <- df %>% 
    mutate(duration = ifelse(is.na(duration), as.character(names), duration))

Output

    df_new
    #    names duration
    # 1      J       97
    # 2      G        G
    # 3      H       53
    # 4      A       23
    # 5      E        E
    # 6      D        D
    # 7      C       73
    # 8      F        F
    # 9      B       37
    # 10     I       67
emehex
  • 9,874
  • 10
  • 54
  • 100
  • Thanks this worked for me perfectly. I only needed to add one line of code to replace all dots with commas in duration column for numbers. – TityBoi Aug 23 '16 at 09:02
1

We can use is.na to create a logical index and then subset both the 'names' based on the 'i1' to replace the 'duration' on the same row.

i1 <- is.na(df$duration)
df$duration[i1] <- df$names[i1]
df
#   names duration
#1      J       97
#2      G        G
#3      H       53
#4      A       23
#5      E        E
#6      D        D
#7      C       73
#8      F        F
#9      B       37
#10     I       67

NOTE: This should change the class of the 'duration' to character from numeric


Or this can be done with a faster approach with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), change the class of 'duration' to character, then by specifying the condition in 'i' (is.na(duration)), we assign (:=) the values in 'name' that correspond to the 'i' condition to 'duration'. As the assignment happens in place, it will be very efficient.

library(data.table)
setDT(df)[, duration:= as.character(duration)][is.na(duration), duration:= names]

data

df <- structure(list(names = c("J", "G", "H", "A", "E", "D", "C", "F", 
"B", "I"), duration = c(97L, NA, 53L, 23L, NA, NA, 73L, NA, 37L, 
67L)), .Names = c("names", "duration"), row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662