0

I am using a data frame called myData and I am trying to clean the data. In one of the rows (in example code, row 3), there was a data entry error and B was left blank and C, D, E now contain the data for B, C, D. How can I fix this in RStudio using basic commands? I am totally stuck.

   A   B   C   D   E
 --------------------
1  a   3   c   2   f
2  a   2   b   1   f
3  a       2   c   1
4  a   1   b   2   f
5  b   2   c   3   e
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
theShuffle
  • 13
  • 1
  • Just a note: my actual data set has 10,000+ rows and 20+ columns, so I guess it would be possible to reference a specific row and update the entry, but I am not even sure how to do that properly. I still need it to maintain each type of data if possible. – theShuffle Oct 03 '19 at 04:46
  • So for every row where `B` is blank you want to shift values in that row to left ? So that will make column `E` blank or with `NA` ? Is that correct ? – Ronak Shah Oct 03 '19 at 04:47
  • I think I poorly explained it. I want specifically row 3 to read "a, 2, c, 1, NA", as this error occurs only once. – theShuffle Oct 03 '19 at 04:52

1 Answers1

1

We can first get index of values where B is NA and then shift the column values of those rows.

rows <- df$B == ""
df[rows, 2:(ncol(df) - 1)] <- df[rows, 3:ncol(df)]
df[rows, ncol(df)] <- NA

df
#  A B C D    E
#1 a 3 c 2    f
#2 a 2 b 1    f
#3 a 2 c 1 <NA>
#4 a 1 b 2    f
#5 b 2 c 3    e

To change column types, we can use type.convert

df <- type.convert(df)

data

df <- structure(list(A = c("a", "a", "a", "a", "b"), B = c("3", "2", 
"", "1", "2"), C = c("c", "b", "2", "b", "c"), D = c("2", "1", 
"c", "2", "3"), E = c("f", "f", "1", "f", "e")), row.names = c("1", 
"2", "3", "4", "5"), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • What if I wanted it so that instead of finding any instance of blank, it only affected row 3? – theShuffle Oct 03 '19 at 05:26
  • @theShuffle Do `df[3, 2:(ncol(df) - 1)] <- df[3, 3:ncol(df)]` if you want it only for row 3. Or even shorter `df[3, 2:4] <- df[3, 3:5]` if you have only 5 columns as shown. – Ronak Shah Oct 03 '19 at 05:27
  • I'm getting an error because some of my data types are factors. How would I work around this? – theShuffle Oct 03 '19 at 05:38
  • first convert all the data to character `df[] <- lapply(df, as.character)`, then do `df[3, 2:4] <- df[3, 3:5]` and then run `df <- type.convert(df)` to convert it back. – Ronak Shah Oct 03 '19 at 05:40