R: Automate scraping & storage of Twitter data on the cloud

Question

I am an R user working on a project that involves gaining insights from Twitter data (more specifically, scraping Twitter data using the rtweet package, and conducting a set of analyses on this data). In addition, I have built a Shiny app based on this data for visualisation purposes.

Where I Need Further Inputs

Today, the Twitter data that I scrape is stored locally on my laptop. However I'd like to do this differently. Ideally, I'd like to be able to achieve the following -

1) The data is scraped from Twitter using rtweet package and stored directly on a cloud platform (like AWS or Microsoft Azure, for example).

2) I'd like to define a periodicity for this scraping process (for example: once every two days). I'd like to achieve this through some scheduling tool.

3) Eventually, I'd like my Shiny app (hosted on shinyapps.io) to be able to communicate with this cloud platform and retrieve the tweets stored in it for analysis.

I have searched the Internet for solutions, but haven't found anything straight-forward yet.

If anyone has experience doing this, your inputs would be highly appreciated.

What have you tried? Your question is almost "can you do my project for me?". Please try and reduce your posts down to one question at a time with a potential clear and correct answer, giving examples of how you have tried, and don't be ashamed if you've failed. We're all failures here. — Spacedman, Apr 20 '18 at 15:58
I'm just trying to get suggestions on the best direction to take, based on your experiences. I have not attempted to store my data on a cloud platform yet, I am relatively new to this. — Varun, Apr 20 '18 at 16:04
So your first question is "How do I store data from R to a cloud server?", and if you can specify that cloud provider then even better. That Q might already have an answer, or an R package. — Spacedman, Apr 20 '18 at 16:06
One option is to create an accont in AWS, you´ll need a credit card for that. Then, read [this](http://strimas.com/r/rstudio-cloud-1/) on how to run an RStudio server on AWS. For scraping tweets you can use `search_tweets` or `stream_tweets` from the `rtweet` package. There is no need for a scheduling tool, you can embed any of those functions in a for loop and use de `Sys.sleep` function to specify for how long you want to wait until the next iteration is done. Then save the retrieved data. I haven´t connected the data stored in AWS with a Shiny app yet. — csmontt, Apr 20 '18 at 17:43
When you say save the data, you mean save it as an object in the AWS RStudio server Environment? — Varun, Apr 20 '18 at 21:33

score 1 · Accepted Answer · answered Apr 21 '18 at 09:08

You create account at AWS. Then you create s3 bucket On your virtual server or machine from hwrre you want to do a copy, you install aws cli(client for interacting with aws resiurces)

Then,you ran copy command and files are being copied to cloud.

Same way back, you use cli to retrieve the files

R: Automate scraping & storage of Twitter data on the cloud

1 Answers1