By way of example, see the extraction of Twitter handles below. The target is to have a character string that resembles tweets
but has only handles separated by commas. str_replace_all
yields empty vectors when no matches are found and that threw some unexpected errors further down the track.
library(purrr)
library(stringr)
tweets <- c(
"",
"This tweet has no handles",
"This is a tweet for @you",
"This is another tweet for @you and @me",
"This, @bla, is another tweet for @me and @you"
)
mention_rx <- "@\\w+"
This was my first attempt:
map_chr(tweets, ~str_c(str_extract_all(.x, mention_rx)[[1]], collapse = ", "))
#> Error: Result 1 must be a single string, not a character vector of length 0
Then I played around with things:
mentions <- map(tweets, ~str_c(str_extract_all(.x, mention_rx)[[1]], collapse = ", "))
mentions
#> [[1]]
#> character(0)
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> [1] "@you"
#>
#> [[4]]
#> [1] "@you, @me"
#>
#> [[5]]
#> [1] "@bla, @me, @you"
as.character(mentions)
#> [1] "character(0)" "character(0)" "@you" "@you, @me"
#> [5] "@bla, @me, @you"
Until it dawned on me that paste
could also be used here:
map_chr(tweets, ~paste(str_extract_all(.x, mention_rx)[[1]], collapse = ", "))
#> "" "" "@you" "@you, @me" "@bla, @me, @you"
My questions are:
- Is there a more elegant way of getting there?
- Why doesn't
str_c
behave the same aspaste
with an identicalcollapse
argument? - Why don't
as.character
andmap_chr
recognise a character vector of length zero as equivalent to an empty string butpaste
does?
I found some good references on str(i)_c, paste, and the difference between them; but none of these addressed the situation with empty strings.