0

I have a list of questions regarding AWS usage which I am not sure if I get the correct answer or if I am using the best practice available.

Before the use of AWS, I have or do the following in my Macbook: - Maintain a small .odb database (Around 100MB) but its expected to grow to a few GB in a year. - Have a few R scripts to do web scraping and import the data into the database. - Have another few R scripts to extract data from the database and do analysis.

Given the growing data volume and more complex analytics have to be performed, my Macbook is always heavily loaded and i decided to switch to AWS for better computing power if needed. I am using the AWS free tier and below is what I have successfully done using AWS so far:

  1. I created an EC2 instance and could retrieve files from my S3 bucket.
  2. I can perform analysis using my R scripts and save the result in my S3 bucket.

And here is the list of my questions:

  1. For maintaining a database of size ~1GB, is it good to simply put it in S3 and load the whole file into R everytime ?Or should I try the RDS service?

  2. Is there a charge on data transfer between EC2 instances and my S3 bucket?(i.e. Does it matter if I transfer in and out 10GB data between an instance and S3 as compared to 1000GB?)I am not sure where to find this piece of info.

  3. For web scraping using an EC2 instance, is there a charge on the internet connection?Or the cost is only applicable to the instance type I choose to use, no matter if I perform computation or web scraping?

  4. I also read a few articles on AWS EBS but I am quite confused about the difference between S3, EBS or setting up RDS.

I expect my data volume to grow exponentially as I write more R scripts to scrap different publicly available data for analysis. In terms of computing power, currently I need more than what my MacBook offers, mainly to do some parallel processing and analytics. I will also test some machine learning algorithm in the future.

Any advice would be useful.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Bosco Lam
  • 43
  • 2
  • Too broad, too many different questions. For the first one, RDS is something you'd use if you want to host your data in the cloud, and have it be accessible there. – Tim Biegeleisen Jun 01 '18 at 06:18
  • Understand the topic is quite broad but my question touches a few areas regarding AWS usage. Therefore, I decide to group them together in a single post instead of spreading my questions all over the place. As for RDS, is it mainly used for hosting data in the cloud? I am more used to the SQL-based data extraction and processing and I thought setting a RDS may more suit my style. Is there any drawbacks using RDS as compared simply saving my data in S3? – Bosco Lam Jun 01 '18 at 06:58
  • The AWS documentation will cover most of your questions, I think. You should not expect other people to do your legwork. – Tim Biegeleisen Jun 01 '18 at 07:00
  • I did my search on the AWS documentation but some terminologies confused me. For example, regarding the charges, I look at the AWS Simple Monthly Calculator but I cannot figure out the items under the Data Transfer section (and whether it is referring to data transfer between EC2 and S3 or between EC2 and other services). – Bosco Lam Jun 01 '18 at 07:04

0 Answers0