I have a data table that have many columns of street address field, like NUM
, STREET_PRE
, STREETNAME
, STREETTYPE
,APT_NO
, CITY
, STATE
, ZIP
. Many rows don't have values in all columns, like STREET_PRE
or APT_NO
.
I need to get a address string from these columns. Using paste0 will put string "NA" into the result. I searched and found some discussions about this problem was just replacing "NA" afterwards, or using if else in concatenate. Replacing NA afterwards could bring problem if there is some valid input of "NA".
stringr
's str_c
looks promising because it will skip NAs. However I always get NA as result for this input:
>t1 = c(NA, "Charles County, MD", NA, "Charles County", "MD","00000")
>str_c(t1, collapse = '')
[1] NA
>stri_c(t1,ignore_null = TRUE, collapse = '')
[1] NA
This input have some non-standard values but I still expect to get an address string. I'll let the geocoder to determine whether the address is valid.
It seemed to be a simple task but difficult to get what I want. paste
, str_c
, stri_c
seemed all trying to match two vectors and concatenate them, but I only want to concatenate one serial of strings. They all work for normal cases, however the input with NA caused different problems in different functions.
EDIT The answer and comments below worked on single row of input but created unexpected result in data table. They may be working on the whole vector of columns when I only want them to work on current row.
I tried these lines without success:
address2011_MD_DC[, input_address := paste(na.omit(c(NUM_MILE,STREET_PRE,STREETNAME,STREETTYPE,STREETSUF,APT_NO)),collapse = " ") ]
address2011_MD_DC[, input_address :=
str_c(na.omit(c(NUM_MILE,STREET_PRE,STREETNAME,STREETTYPE,STREETSUF,APT_NO)), collapse = ' ' )]
It could be the collapse parameter combine the column I want into single value. Maybe I have to come back to my original method of using paste0
, but replace NAs with '' first.