2

I am new to R and started exploring na.strings = c() function along with read.csv.

I have read that using this option, all the missing values will be replaced to NA but I don’t see that happening in my files. I don’t see any difference in the output despite using na.strings = c(). Please help if I am missing something. In both the cases, I see NA when numeric value is missing but not when char value is missing. So, what is the use of using this function?

Here is my sample csv file:

Char,Numeric
A,3
B, 
 ,5

And my code:

DF_withoutNA = read.csv("filepath/R_NA.csv",header = TRUE)
DF_with = read.csv("filepath /R_NA.csv",header = TRUE,
                   na.strings = c("Char","Numeric"))
head(DF_withoutNA)
  Char Numeric
1    A       3
2    B      NA
3            5
head(DF_with)
  Char Numeric
1    A       3
2    B      NA
3            5
ZygD
  • 22,092
  • 39
  • 79
  • 102
lakshru
  • 41
  • 1
  • 1
  • 5

3 Answers3

6

The na.strings argument is for substitution within the body of the file, that is, matching strings that should be replaced with NA. So with your example if you pass the empty string "" it should match your missing character string, which is stripped white space.

x <- read.csv("filepath/R_NA.csv",header=TRUE,na.strings=c(""))
x
 Char Numeric
1    A       3
2    B      NA
3 <NA>       5
mrbcuda
  • 580
  • 7
  • 16
4

what is the use of using this function?

It replaces values (eg., characters, numbers) in you csv file with NA. If you try read.csv("filepath/R_NA.csv", na.strings = "A") you'll see that all A's in csv were replaced with NA's.

PS. na.strings is the argument, not the function.

pogibas
  • 27,303
  • 19
  • 84
  • 117
-1

na.string replaces the missing values with 'NA' as a notation. This needs to be done preferably at the beginning of the data cleaning process.