2

I have an R data frame that has character(0) and list() values inside the cells. I want to replace these with NA values.

In the following example, the field "teaser" has this issue, but it can be anywhere in the data frame.

df <- structure(list(body = "BAKER TO VEGAS 2022The Office fielded two squads this year in the 36th Annual Baker to Vegas (“B2V”) Challenge Cup Relay on April 9-10.  Members of our 2022 B2V Team include many staff and AUSAs who were joined by office alums and a cadre of friends and family who helped out during some rather brutal conditions this year with temperatures around 100 degrees for much of the days. Most importantly, everyone had fun… and nobody got hurt!  It was a great opportunity to meet (and run past) various members of our law enforcement community and to see the amazing logistics of the yearly event. Congratulations to all the participants.", 
    changed = structure(19156, class = "Date"), created = structure(19156, class = "Date"), 
    date = structure(19090, class = "Date"), teaser = "character(0)", 
    title = "Baker to Vegas 2022", url = "https://www.justice.gov/usao-cdca/blog/baker-vegas-2022", 
    uuid = "cd7e1023-c3ed-4234-b8af-56d342493810", vuuid = "8971702d-6f96-4bbd-ba8c-418f9d32a486", 
    name = "USAO - California, Central,"), row.names = 33L, class = "data.frame")

I've tried numerous things that don't work, including the following:

df <- na_if(df, "character(0)")

Error in charToDate(x) : 
  character string is not in a standard unambiguous format

Thanks for your help.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
generic
  • 302
  • 1
  • 3
  • 14
  • 1
    You are attempting to apply `na_if` *to the entire data.frame*, but it expects a *vector* as its input. The documentation of `na_if` contains an example for how to apply the function across table columns. — However, I would try instead solving the root cause of the issue: where are these values coming from in the first place?! – Konrad Rudolph Dec 22 '22 at 15:36
  • Do you know if there's a way to do something similar but to the entire data frame? Or do I have to do one column at a time? – generic Dec 22 '22 at 15:38
  • See my comment update: the `na_if` documentation shows how to do this. – Konrad Rudolph Dec 22 '22 at 15:39
  • Thanks -- I will look at the `na_if()` documentation. Sadly, they come straight from an API pull that is outside my influence. – generic Dec 22 '22 at 15:40

2 Answers2

2

We could use

library(dplyr)
df %>%
   mutate(across(where(is.character), ~ na_if(.x, "character(0)")))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Here is a base R way.

  1. create a logical index taking the value TRUE when the columns are of class "character";
  2. create an index list on those columns with lapply;
  3. with mapply change the bad values to NA.
i_chr <- sapply(df, is.character)
inx_list <- lapply(df[i_chr], \(x) x == "character(0)")
df[i_chr] <- Map(\(x, i) {is.na(x) <- i; x}, df[i_chr], inx_list)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66