5

I know that there is a similar topic on the forum, but there is no answer to my question, and I tried in different ways (R regex - extract words beginning with @ symbol). I need to write out all words that do not have a symbol in front of them.

Below is a code that cuts out all words that contain the #sign and the result of this action.

tweeter <- c("#tweeter tweet", "h#is", "tweet #tweeter2", "twet")
str_extract_all(tweeter, "(?<=\\B\\#)[^\\s]+")

Result of it:

[[1]]
[1] "tweeter"

[[2]]
character(0)

[[3]]
[1] "tweeter2"

[[4]]
character(0)

Now the code with which he tries to display all the words without # at the beginning.

regmatches(tweeter, gregexpr("\\B#\\S+", tweeter), invert = T) 

I would like to display a list of words that don't start with #just don't know how to write it properly.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

2 Answers2

4

This gives you words that doesn't begin with #

library(stringr)

tweeter[!str_detect(tweeter, "^#")]
# "h#is"            "tweet #tweeter2" "twet"  

Explanation

str_detect(tweeter, "^#") returns a logical vector based on the pattern and in this case ^#. ^ matches the beginning and # specifies the character to match at the beginning.

Finally, to return values where the condition is TRUE we use tweeter[].

The same thing can be achieved without using ! which negates the logical value by using negate argument within the str_detect as follows:

tweeter[str_detect(tweeter, "^#", negate = TRUE)]
# "h#is"            "tweet #tweeter2" "twet" 
Community
  • 1
  • 1
deepseefan
  • 3,701
  • 3
  • 18
  • 31
1

In base R, we can use grep with invert and value parameter as TRUE.

grep("^#", tweeter, invert = TRUE, value = TRUE)
#[1] "h#is"            "tweet #tweeter2" "twet" 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213