I have retrieved many tweets from twitter using the r package twitteR.
After I've done this successfully, my goal is to create edges for a network analysis based on the mentions in those tweets. For this purpose I used the following code to get twitter usernames which were mentioned in a tweet:
tweets <- read.csv(file="tweets.csv")
tweets$mentions <- str_extract_all(tweets$text, "@\\w+")
There are tweets in which more than one username is mentioned for example "usernameA, usernameB and usernameC", but they are together in one row. Now I would like to multiple the rows with those tweets that mention more than one username with the number of usernames in this tweets. At the same time only one username should show up per row in the end. Let me illustrate what I mean on the already used example:
At the time being I have a row with two columns (text, mentions):
- "text of the tweet"; "usernameA, userNameB, usernameC"
I would like to have three rows in this case:
- "text of the tweet"; "usernameA"
- "text of the tweet"; "usernameB"
- "text of the tweet"; "usernameC"
My problems are:
- How do I let r check for entries that consist of a list (c ("usernameA", "usernameB", ...) in a specified column?
- How do I tell r to multiple this certain entry x-1 times (x=number of mentions)?
- How do I get r to leave only one username in each row?