I want to remove words of length less than 3 in a string. for example my input is
str<- c("hello RP have a nice day")
I want my output to be
str<- c("hello have nice day")
Please help
I want to remove words of length less than 3 in a string. for example my input is
str<- c("hello RP have a nice day")
I want my output to be
str<- c("hello have nice day")
Please help
Try this:
gsub('\\b\\w{1,2}\\b','',str)
[1] "hello have nice day"
EDIT \b is word boundary. If need to drop extra space,change it as:
gsub('\\b\\w{1,2}\\s','',str)
Or
gsub('(?<=\\s)(\\w{1,2}\\s)','',str,perl=T)
Or use str_extract_all
to extract all words that have length >=3 and paste
library(stringr)
paste(str_extract_all(str, '\\w{3,}')[[1]], collapse=' ')
#[1] "hello have nice day"
Here's an approach using the rm_nchar_words
function from the qdapRegex package that I coauthored with @hwnd (SO regex guru extraordinaire). Here I show removing 1-2 letter words and then 1-3 letter words:
str<- c("hello RP have a nice day")
library(qdapTools)
rm_nchar_words(str, "1,2")
## [1] "hello have nice day"
rm_nchar_words(str, "1,3")
## [1] "hello have nice"
As qdapRegex aims to teach here is the regex behind the scene where the S
function puts 1,2
into the quantifier curly braces:
S("@rm_nchar_words", "1,2")
## "(?<![\\w'])(?:'?\\w'?){1,2}(?![\\w'])"
x <- "hello RP have a nice day"
z <- unlist(strsplit(x, split=" "))
paste(z[nchar(z)>=3], collapse=" ")
# [1] "hello have nice day"