2

I am working on genetic data and I have a huge output file (tab delimited text file), where in some columns I have missing values. These are left as white space.

I want to change the white space with NA or (.). How can I do this in R?

seaotternerd
  • 6,298
  • 2
  • 47
  • 58
Irene Pappa
  • 41
  • 1
  • 2

2 Answers2

3

Have you actually tried to read your file in? Under ?read.table, the argument na.strings it states:

na.strings
a character vector of strings which are to be interpreted as NA values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.

So, I'm guessing (in lieu of a reproducible example)...

read.table("C:/myfile.txt , sep = "\t")

If you have blank space in columns with character data, you can explicitly set na.strings = "" which should make R consider all white space as NA...

read.table("C:/myfile.txt , sep = "\t" , na.strings = "" )
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • Yes, I understand that. But the problem is that I want to save a new file where I can actually see NA, when I open it with any other program besides R. – Irene Pappa Aug 09 '13 at 12:22
0

Assuming you have a data frame called df and a column called v1, you can recode as missing any strings that are entirely white space via a combination of replace() and grepl():

df$v1 <- replace(df$v1, grepl("^\\s*$", df$v1) == TRUE, NA)

As described by @Cath here, the grepl portion searches the string for "0 or more" (*) spaces (\s) between the beginning (^) and end ($) of a string. If it the string matches those criteria, it's deemed TRUE, otherwise FALSE.

Nested within the replace function, then, R will recode any observation in df$v1 that matches those criteria (i.e. that is TRUE) as missing (i.e. NA).

coip
  • 1,312
  • 16
  • 30
  • Here is an alternative method for all columns: https://stackoverflow.com/questions/76815200/r-dplyr-replace-strings-with-only-spaces-for-all-columns – Aaron C Aug 02 '23 at 03:35