I have a large dataframe (~500,000 observations) consisting of Twitter data (i.e. username, rewtweet counts, text) in RStudio. I want to run a text analysis on the tweets, but I first need to remove retweet tags so they don't affect my keyword searches.
For example, in tweets that are retweets, the text looks like this: RT @BobsAccount Great article! Can't wait to learn more.
I want to remove the string attached to RT @....
.
I have used lapply
and gsub
to remove specific characters. For example, this successfully removed "@" : data <- data.frame(lapply(data, function(x) {gsub("@","", x)}))
But I can't figure out how to remove a "string pattern" (i.e. any text attached to "RT @"). Any help would be greatly appreciated!