2

We have a use case where in we need to access almost millions of files from a Java application. Currently we are storing them in EBS volume. This is turning out to be expensive option(as we have reached upto 15TB now) so we are looking for S3 as the file storage. We are okay to bear the latency.

One option is to mount S3 using s3fs and access the files. But I was exploring the option of AWS Storage gateway if that can provide better caching and faster access. We have faced quite a few issues with s3fs so was looking for alternatives.

Andrew Gaul
  • 2,296
  • 1
  • 12
  • 19
A.K.Desai
  • 1,274
  • 1
  • 10
  • 16
  • 1
    Have you evaluated whether `sc1` EBS volumes would work for your use case? At multi-TB sizes they have some impressive performance, but only 25% the cost of `gp2`. Then you just have ordinary disks. – Michael - sqlbot Jun 20 '18 at 16:01
  • 1
    That's a great suggestion. I am gonna explore more on that front. Looking at the cost, both S3 and sc1 are giving around same cost for 16TB storage – A.K.Desai Jun 20 '18 at 17:56

1 Answers1

4

Avoid using s3fs if possible because it merely emulates a file system and is likely to run into problems with high utilization.

The best solution is for your application to access the files directly from Amazon via S3 API calls, rather than pretending that S3 is a filesystem. This works very nicely for large-scale applications and you would have no administration/maintenance overhead because your application communicates directly with S3. You should serious consider this option.

If you do really need to access the files via a filesystem, consider using AWS Storage Gateway – File Gateway, which can present S3 storage as an NFS share.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Thank you John. Our current architecture expects the files to be in a file system kind of mounts so was looking for options to just move the files to s3 from EBS without changing the java code. If I can get this through storage gateway then its well and good. I will keep the s3 api in mind for future enhancements. – A.K.Desai Jun 20 '18 at 14:15