0

I am using rtweet package by Michael W. Kearney and trying to get the followers list of multiple users. So far it works beautifully if I wish to scrap the followers of one user at a time irrespective of the number of followers he/she may have.

But for my project, I have to scrap 155 profiles therefore, I was wondering if there is a function or way that would allow me to write one command for all the users? So far, when I try to use more than one user, I get the error saying I can use only one user at a time.

EDIT: Two important pieces of info - the desired output is a dataset of 155 columns of each users' followers, which I can export/write as csv or use as a dataframe. Lastly, while suggesting any solution, do keep in mind that if I would use any function of apply family I would end up with list (as per suggested answer of Amar) but then the problem is converting the list in to data.frame and since the columns would be unequal in length I am unable to use as.data.frame() function.

Any ideas or way around? Thanks in advance.

1 Answers1

1

Use either a for loop or apply to get the job done. The package is created to simplify the interaction between you and the twitter API by providing functions that do one particular job (usually mirroring the API). The rest (such as your request) is filled in by R and other packages.

(I am following this tutorial here:)

Let's say you have a list of profiles you want to scrape in a vector:

profiles <- c("Batman", "CatWomen", "Blade")

We can create a custom function to retrieve the follows of the 3 profiles:

getAllFollowers <- function(name) {

  ## how many total follows does cnn have?
  cnn <- lookup_users(name)

  ## get them all (this would take a little over 5 days)
getAllFollowers <- function (name) { 
  user_info <- lookup_users(name) 
  user_follower <- get_followers(name, n=user_info$followers_count, retryonratelimit = T)
  Sys.sleep(2) #sleep for 2 seconds
  return(user_follower) }

We can then use lapply to iterate over the list of profiles and retrieve their follows:

out <- lapply(X = profiles, FUN = getAllFollowers)

This will create a list object that (from reading) is:

A tibble data frame of follower IDs (one column named "user_id").

Amar
  • 1,340
  • 1
  • 8
  • 20
  • "A tibble data frame of follower IDs (one column named "user_id")." does it mean all the followers I scraped via your method will appear in one column? Hence, I have no way to find out where the followers of Batman end and Catwomen begin? moreover, I am interested in getting all the followers in distinct column, but I think changing list in to dataframe and then in to csv should not be a problem, but having all in one column is an issue! Isn't so? – Waqas Chaudhary May 08 '18 at 07:38
  • No, they will allow be separate. You will have a list called `out` with 3 objects, one for each profile. Each of these objects is a tibble dataframe with a single column called `user_id`. It's like running the function 3 times separately then combining the output like this: `out <- c(out_batman, out_catwomen, out_blade)` – Amar May 08 '18 at 07:49
  • the function you wrote need some formatting. `getAllFollowers <- function (name) { user_info<- lookup_users(name) user_follower <- get_followers(name, n=user_info$followers_count, retryonratelimit = T, Sys.sleep(2)) return(user_follower) }`. Also, i find sapply better because it gives `profilename$user_id`. Lastly, the output is not df but list. Though, it is tibble, but i can't use simple df commands eg `head(out)` and `is.list(out)` also gives `True`. I am still naive in R and distinguishing between data formats, thus still trying to get the output in right format but thanks it did work :) – Waqas Chaudhary May 08 '18 at 10:14
  • You never specified your desired output, update your post specifying this info. To be honest with you, you've given me very little to work with. More info the better. And yes the output is a list of tibbles. I said this explicit above. – Amar May 08 '18 at 10:39
  • I have edited the post and marked your answer. Just out of curiosity! do you have any idea regarding the what I mentioned in the edited version of the post? – Waqas Chaudhary May 08 '18 at 20:19
  • What are you doing with the list of followers downstream? As you mentioned its not easily convertible into a df. IMO a list of lists is the best object type to work with in such a case. You just have to retool your code to work with lists (which might actually be easier!). It's possible to have a "long" style df that has columns eg `User`, `followers`, where followers is a list. This way you can interact as you would with a df but `followers` is list object. – Amar May 08 '18 at 22:30
  • Yes! I guess it makes sense to work with list because I have at time over million rows e-g followers of Donald Trump. I am doing network analysis so, in essence, will be looking who is following who and conduct some similar analysis later on what they are posting etc. Thanks a lot for your help :-) – Waqas Chaudhary May 09 '18 at 11:41
  • Hey Amar, I was just re-working with what you write, but today it suddenly stops working and gives me the error: `Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column Called from: fix.by(by.y, y)` - I am unable to figure it out why is that because I did test and there was no issue with it previously, any hints? – Waqas Chaudhary Jun 01 '18 at 15:30
  • That's an error produced by `merge` function, maybe one of the `rtweet` functions uses it? You should post another question. – Amar Jun 05 '18 at 01:04