0

I'm working with the text layer of a PDF and have some minor corrections to make...

The tidy dataframe I've generated has one or two data values that are off by a row. I have the 'coordinates' of the incorrectly positioned values (defined by a combination of other variables) and I have the positions of where they should actually go. I just need to move data values from A to B and filter out the row corresponding to A. For example:

Change this:

data.frame(A = 1:3,
           B = 1:3,
           C = c("Oops wrong row", NA, "this one is OK"))

Into this:

data.frame(A = 2:3,
           B = 2:3,
           C = c("Oops wrong row", "this one is OK"))

I've written some code which achieves this. But it seems far more verbose than it needs to be. And the functions seem to rely on the incidental features of the dataframe in this example. I thought this might be a common task - is there a standard pattern for this kind of task? Or at least a more elegant approach?

df <- data.frame(A = 1:3,
                 B = 1:3,
                 C = c("Oops wrong row", NA, "this one is OK"))

get_row <- function(df, A, B, output = "index") {

  index <- which(df[["A"]] == A & df[["B"]] == B)

  if (output == "index") {
    return(index)
  }
  else if (output == "C") {
    return(df[["C"]][[index]])
  }

}

correct_df <- function(df) {

  from <- list(A = 1,
               B = 1)

  to <- list(A = 2,
             B = 2)

  df <- df %>%
    dplyr::mutate(C = replace(C,
                                 get_row(., to[["A"]], to[["B"]]),
                                 get_row(., from[["A"]], from[["B"]],
                                          output = "C"))) %>%
    dplyr::filter(A != from[["A"]] | B != from[["B"]])

  return(df)

}
joga
  • 207
  • 2
  • 4
  • 10

1 Answers1

0

I suspect your real case is probably a bit more complex than your example, but this is the kind of task I normally do with dplyr::case_when().

Essentially if you have criteria that define which rows need to change, you use them as logical conditions in the case_when() call. Note that I create a new variable rather than replace the existing one - it makes checking what happened a lot easier.

df <- data.frame(A = 1:3,
           B = 1:3,
           C = c("Oops wrong row", NA, "this one is OK"))
df %>% 
  mutate(D = case_when(
    .$C == "Oops wrong row" & !is.na(.$C) ~ .$C[is.na(.$C)],
    is.na(.$C) ~ .$C[.$C == "Oops wrong row" & !is.na(.$C)],
    TRUE ~ .$C
  ))
alexwhan
  • 15,636
  • 5
  • 52
  • 66