0

I need to select Twitter handle names from retweets and create a list of usernames. I wonder how I can select text within a string that start with "@". Here's an example for a retweet.

@MyBrianLeyh @IngrahamAngle @TombStoneBub @MeticulousPaul @kjross1970 @RealTT2020 @busylizzie48 @LaylaAlisha11…

Thanks!

I've tried the MID function on Excel to select user names. However, I have to specify the length of the (part of the) string. User names have different lengths. So the results are not accurate.

duckmayr
  • 16,303
  • 3
  • 35
  • 53

1 Answers1

1

Here, I use a string with some Twitter handles and random text as an example. I use strapplyc to pull out all text between a @ and a space \\.

# Test string
test <- "@MyBrianLeyh @IngrahamAngle @TombStoneBub @MeticulousPaul @kjross1970 @RealTT2020 This is part of a tweet @busylizzie48 @LaylaAlisha11 This is another part"

# Load library
library(gsubfn)
#> Loading required package: proto

# Extract all handles between @ and a space
strapplyc(test, "@(.*?)\\ ", simplify = c)
#> [1] "MyBrianLeyh"    "IngrahamAngle"  "TombStoneBub"   "MeticulousPaul"
#> [5] "kjross1970"     "RealTT2020"     "busylizzie48"   "LaylaAlisha11"

Created on 2019-03-28 by the reprex package (v0.2.1)

Dan
  • 11,370
  • 4
  • 43
  • 68
  • I tried this with a csv file that has a list of retweets. It returns the list of usernames in the each cell. However, the list does not map the handles back to each tweet. Is there any command to list each handle mentioned in each tweet in separate columns in the same row? – Chamil Rathnayake Mar 28 '19 at 12:52
  • @ChamilRathnayake It's hard to say without a reproducible example. Can you edit your question to include the data structure (or a subset of the data structure) you're working with? For example, using `dput`. – Dan Mar 28 '19 at 13:04
  • This is what I tried. I had retweets in a csv file.#=========library(gsubfn) handles <- strapplyc(rts, "@(.*?)\\ ", simplify = c) write.csv(handles, "hanldes_2.csv")#========= – Chamil Rathnayake Mar 28 '19 at 13:08