How to replace hebrew characters to NA in R

Question

I have a table which some columns have values of Unknown but in Hebrew ("לא ידוע") where I try to replace those values with NA.

I have seen in other answers uses of functions like:

replace_with_na_a

or:

df [df == "לא ידוע"] <- NA

But none of them succeeded. Is there a way to replace them?

I would like to point out that I did the actions that translate the table into Hebrew:

write.csv(dataset,"D:/Doctorate/Courses/R/data_intro_r_test.csv", row.names = FALSE)

#Read & read the Hebrew in the table
Sys.setlocale("LC_ALL", "Hebrew")
sample.data <- read_csv(file = "data_intro_r_test.csv",
                        locale = locale(date_names = "he", encoding = "UTF-8"))
line=readLines("D:/Doctorate/Courses/R/data_intro_r_test.csv", encoding = "UTF-8")
iconv(line, "ISO-8859-8", "UTF-8")

Thank you, Maya

Already checked [these](https://stackoverflow.com/q/17517319/6574038) solutions? — jay.sf, Jan 23 '22 at 12:54
If you know of a particular location that has that string, maybe test compared to it? E.g. if entry 1 of column 2 has it, use `df[df == df[1,2]] <- NA`. — user2554330, Jan 23 '22 at 15:37

score 0 · Answer 1 · answered Jan 23 '22 at 16:43

We have to assess which columns in df are of class character. Then, replace all character strings in those columns that do not contain letters from the Latin alphabet with NA.

df <- data.frame(
  numbers = rnorm(n = 4L),
  factor = gl(2L, 2L, labels = c('male', 'female')),
  string = c('I', 'use', 'regex', 'לא ידוע')
)
ncX <- ncol(df)
for (i in seq_len(ncX)) {
  if (class(df[, i]) %in% 'character') {
    df[, i] <- gsub('[^a-zA-Z]', NA, df[, i])
  } else next
}

> df
     numbers factor string
1  0.9352600   male      I
2 -0.5864270   male    use
3 -0.6104655 female  regex
4  0.3481119 female   <NA>

How to replace hebrew characters to NA in R

1 Answers1