I am working on genetic data and I have a huge output file (tab delimited text file), where in some columns I have missing values. These are left as white space.
I want to change the white space with NA or (.). How can I do this in R?
I am working on genetic data and I have a huge output file (tab delimited text file), where in some columns I have missing values. These are left as white space.
I want to change the white space with NA or (.). How can I do this in R?
Have you actually tried to read your file in? Under ?read.table
, the argument na.strings
it states:
na.strings
a character vector of strings which are to be interpreted as NA values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.
So, I'm guessing (in lieu of a reproducible example)...
read.table("C:/myfile.txt , sep = "\t")
If you have blank space in columns with character
data, you can explicitly set na.strings = ""
which should make R consider all white space as NA
...
read.table("C:/myfile.txt , sep = "\t" , na.strings = "" )
Assuming you have a data frame called df
and a column called v1
, you can recode as missing any strings that are entirely white space via a combination of replace()
and grepl()
:
df$v1 <- replace(df$v1, grepl("^\\s*$", df$v1) == TRUE, NA)
As described by @Cath here, the grepl
portion searches the string for "0 or more" (*) spaces (\s) between the beginning (^) and end ($) of a string. If it the string matches those criteria, it's deemed TRUE
, otherwise FALSE
.
Nested within the replace
function, then, R will recode any observation in df$v1
that matches those criteria (i.e. that is TRUE
) as missing (i.e. NA
).