2

I have a data table that have many columns of street address field, like NUM, STREET_PRE, STREETNAME, STREETTYPE,APT_NO, CITY, STATE, ZIP. Many rows don't have values in all columns, like STREET_PRE or APT_NO.

I need to get a address string from these columns. Using paste0 will put string "NA" into the result. I searched and found some discussions about this problem was just replacing "NA" afterwards, or using if else in concatenate. Replacing NA afterwards could bring problem if there is some valid input of "NA".

stringr's str_c looks promising because it will skip NAs. However I always get NA as result for this input:

>t1 = c(NA, "Charles County, MD", NA, "Charles County", "MD","00000") 
>str_c(t1, collapse = '')
[1] NA
>stri_c(t1,ignore_null = TRUE, collapse = '')
[1] NA

This input have some non-standard values but I still expect to get an address string. I'll let the geocoder to determine whether the address is valid.

It seemed to be a simple task but difficult to get what I want. paste, str_c, stri_c seemed all trying to match two vectors and concatenate them, but I only want to concatenate one serial of strings. They all work for normal cases, however the input with NA caused different problems in different functions.

EDIT The answer and comments below worked on single row of input but created unexpected result in data table. They may be working on the whole vector of columns when I only want them to work on current row.

I tried these lines without success:

address2011_MD_DC[, input_address := paste(na.omit(c(NUM_MILE,STREET_PRE,STREETNAME,STREETTYPE,STREETSUF,APT_NO)),collapse = " ") ]
address2011_MD_DC[, input_address :=
                str_c(na.omit(c(NUM_MILE,STREET_PRE,STREETNAME,STREETTYPE,STREETSUF,APT_NO)), collapse = ' ' )]

It could be the collapse parameter combine the column I want into single value. Maybe I have to come back to my original method of using paste0, but replace NAs with '' first.

dracodoc
  • 2,603
  • 1
  • 23
  • 33
  • 1
    Can you show the actual format your data is in? (Including the columns?) If it's a single vector, does `paste(t1[!is.na(t1)],collapse=" ")` not yield the desired result? – Heroka Nov 13 '15 at 14:15
  • It's a data frame/data table with many rows, each row have multiple columns. Yes this also works, thanks! – dracodoc Nov 13 '15 at 14:32
  • Sorry this does not work directly in data.table. Probably the collapse caused unintended result. – dracodoc Nov 13 '15 at 18:26

1 Answers1

2

From stringi documentation:

In case of any NA in an input vector, NA is set to the corresponding element. Note that this behavior is different from paste, which treats missing values as ordinary strings "NA".

Try to remove the NAs first:

library(stringi)
stri_c(na.omit(t1), collapse = "")
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
  • Yes I tried na.omit earlier, the direct result of na.omit give some numbers, I haven't try put na.omit inside the stri_c. This seemed work, but I was thinking stringr and stringi should be able to deal with NAs without help of na.omit since they mentioned different treatment of NAs, and have option of ignore_null. – dracodoc Nov 13 '15 at 14:34
  • @dracodoc `NULL` is not the same as `NA`. `NULL` represents the null object in R: it is a reserved word. `NULL` is often returned by expressions and functions whose values are undefined. `NA` is a logical constant of length 1 which contains a missing value indicator – Steven Beaupré Nov 13 '15 at 14:39
  • Using str_c or stri_c in data table seemed to be more complicated than I thought. DT[, address := str_c(na.omit(c(address columns)), collapse = "")] created unexpected results, actually my R-Studio lost response when I tried to look at the result. – dracodoc Nov 13 '15 at 15:14
  • Why not simply `df$newcol <- apply(df, 1, function(x) stri_c(x[!is.na(x)], collapse = ""))` ? – Steven Beaupré Nov 13 '15 at 16:52
  • Thanks, this do work. However my data table have lots of columns, to use this method I need to pick the columns I need to process first (the t1 in my question is a over simplified example), so it is not really simple for my case. Now I just replace all NAs with ''", then used paste0. With paste0 I have more control of separator, I need some of them to be " ", some of them to be ", ". Of course at last I need to remove extra white spaces caused by empty columns. – dracodoc Nov 13 '15 at 18:22
  • If you could post a `dput()` of your data it would help to come up with a more suitable solution. – Steven Beaupré Nov 13 '15 at 18:51