4

I have multiple s3 buckets on an aws account and I also have a EC2 machine running Rstudio Pro. I would like to access my S3 buckets (that are several Terabytes of data each).

I would like to be able to set up rstudio to mount the buckets as Data sets with out having to copy the whole thing into an EBS before reading it every time.

Any help would be great.

spsaaibi
  • 452
  • 4
  • 13
Josh Beauregard
  • 2,498
  • 2
  • 20
  • 37
  • 2
    You can take a look at this very work in progress package: https://github.com/cloudyr/aws.s3 – Thomas Mar 19 '16 at 13:32

2 Answers2

8

It seems that you could try the aws.s3 package from the cloudyr project, https://github.com/cloudyr/aws.s3.

With this, assuming you have your data on a private bucket, you could access it as follows:

aws.s3::getbucket(
bucket = 'hpk',
key = YOUR_AWS_ACCESS_KEY,
secret = YOUR_AWS_SECRET_ACCESS_KEY
)

Hopefully this will help you accessing data from your buckets. You can then also try aws.ec2 to communicate with your ec2 machine.

spsaaibi
  • 452
  • 4
  • 13
  • So this method is working great so for but I hit all wall with the Rversion compatibility and I don't know how to check for the version the aws.s3 plugin works with or how to install set version of R. Getting: `package ‘aws.s3’ is not available (for R version 3.2.4 Revised) ` when I try to install the plugin – Josh Beauregard Mar 21 '16 at 19:34
  • @JoshBeauregard you should install this package from it's installation guide and not from CRAN https://github.com/cloudyr/aws.s3#installation – Marcin Dec 13 '17 at 13:35
  • You can now install the package from CRAN. Also: I had to add the `region =` argument for it to connect. – Humpelstielzchen May 15 '20 at 11:17
0

My go to package for these types of tasks with Python is boto. And it looks like there isn't a ported version for R.

I have not tried this but in case you might find this useful ...

RS3

BGA
  • 563
  • 6
  • 14