1

I have a Dataframe in R where certain rows in some of the columns have values that are NA or empty strings "". I want to convert these to NULL values.

So I need any cells in my data frame that are NA or "" to be NULL. How can I do this?

When I try:

DF[ , DF$Column == NA] <- NULL

or

DF[ , DF$Column == ""] <- NULL

I get the error: missing values are not allowed in subscripted assignments of data frames

If I try:

DF[ , is.na(DF$Column)] <- NULL

I get the error: duplicate subscripts for columns

If I try:

is.na(DF$Column) <- NULL

or

DF[DF == NA] <- NULL

I dont get any errors, but nothing changes in my dataframe.

Jaap
  • 81,064
  • 34
  • 182
  • 193
brno792
  • 6,479
  • 17
  • 54
  • 71
  • 6
    This is a misuse of a NULL value in R. Vectors (which is essentially a column in a data.frame) cannot contain a mixture of observed values and NULL values. In R, we prefer NA to NULL to represent missing data. NULL values are used to remove elements from columns even entire columns. – MrFlick Oct 27 '14 at 22:14
  • 1
    Got it, thanks! So I will eventually be loading this data set into a SQL DB. Will SQL interpret NA as NULL? – brno792 Oct 27 '14 at 22:35
  • Depends how you do it. If you use `RODBC::sqlSave`, yeah, I'm pretty sure that's how it works. – Gregor Thomas Oct 27 '14 at 22:37

1 Answers1

1

There really is no NULL value in a vector. NA is the placeholder. If you want to remove the entire column (which is what assigning NULL would do) when its values are all NA then this would succeed:

df[ , sapply(df, function(x) all(is.na(x)) ) ] <- NULL

If you want to construct an object where you keep only those rows with no NA values:

df[ apply(df,1, function(rw) !any(is.na(rw)) ) , ]
IRTFM
  • 258,963
  • 21
  • 364
  • 487