8

I'm facing a problem to share a storage between multiple EC2 instances. I'm going to have to run heavy jobs so I'll need a lot of instances to do it. On one side ,I have an EBS volume attached to one server instance. On the other side I have a a worker instance. I created an AMI of this worker instance and then I created several instances copies of this AMI. There are all running on the same VPC. Basically the server instance is sending jobs and the workers are executing the job. I would like to save some log files when my workers are running the jobs, in the share storage something like:

worker_1/ logfile.log

worker_2/ logfile.log

What could be the best solution to do that?

  • I read it's not possible to attach the same EBS volume to multiple instances.
  • I had a look at GlusterFS but here is what I found:

"Before realizing a proof of concept with two servers, in different availability zones, replicating an EBS volume with an ext4 filesystem, we will list the cases where GlusterFS should not be used: Sequential files written simultaneously from multiple servers such as logs. The locking system can lead to serious problems if you store logs within GlusterFS. The ideal solution it’s to store them locally then use S3 to archive them. If necessary we can consolidate multiple server logs before or after storing them in S3."

  • And finally, I've also checked S3 bucket mounted with s3fs but I found out it's not a good option too:

"You can't partially update a file with s3fs so changing a single byte will re-upload the entire file" . Then if you want to make small incremental change then its a definite no. You can't use s3fs - S3 Just doesn't work that way you can't incrementally change a file."

Then what could be a good solution to my problems and allows my workers to write their log files in a share storage?

Thanks for your help!

Romanzo

Romanzo Criminale
  • 1,289
  • 2
  • 14
  • 21
  • Take a look at a similar question here http://stackoverflow.com/questions/841240/can-you-attach-amazon-ebs-to-multiple-instances – Scott Jun 03 '15 at 18:50

4 Answers4

4

Thanks for the answers. but finally I'm using NFS between the instances and it works pretty well!

Romanzo Criminale
  • 1,289
  • 2
  • 14
  • 21
  • 1
    I can recommend NFS in AWS. I've been running it for over 1 year now without any issues. Networking in the same region works very well. It takes however 1 extra instance to maintain for NFS server – WooDzu Mar 14 '14 at 21:24
  • NFS is still a SPOF (single point of failure) and the AWS EFS is only available in preview mode in the Oregon region. I think the best we can do is create a NFS specific EBS volume, and if the EC2 instance goes down, detach it, and attach it to another instance. This will take some time though b/c prep work will have to be done on the new instance. – Mike Purcell Oct 29 '15 at 02:28
3

As is described in this thread and in some of the answers already provided, two common ways to accomplish this goal were to use S3 or NFS to share data access between instances.

On April 9th 2015, Amazon announced Amazon Elastic File System (Amazon EFS), which provides a much better solution to the problem you're trying to solve.

Scott
  • 16,711
  • 14
  • 75
  • 120
  • AWS EFS is only available in preview mode in the Oregon region, and there is no official word on when it will be out of preview mode and available in all regions. – Mike Purcell Oct 29 '15 at 02:29
0

Did you consider the option of having each worker write its logs on a local disk (maybe even on the ephemeral partition), and then make each worker upload its own big log file to S3 after it finishes?

This is somewhat similar to what happens when you use Elastic MapReduce to run some distributed tasks on a Hadoop cluster.

You'd get high write throughput (since it is writing to a local disk, if you use the ephemeral partition), and also high upload throughput to send the files to S3 (since you'd have the bandwidth of many workers available).

Bruno Reis
  • 37,201
  • 11
  • 119
  • 156
  • hanks for your answer. I wanted to have the worker writting logs while they're running not after the job is done (because it can be a really long job). But finally I'm trying at the moment to use NFS. We'll see how it goes! – Romanzo Criminale Jul 04 '13 at 05:46
  • Another option would be to make the workers write the logs on local disks (appending to the local file) until some limit is reached (number of lines written, or number of bytes written, or some amount of time elapsed, whatever). When the limit is reached, stop writing to that file and start writing to a new one, and upload the finished file to S3. – Bruno Reis Jul 04 '13 at 05:56
0

Not entirely sure of the context, but would writing objects directly on mounted S3 be feasible?

George
  • 6,006
  • 6
  • 48
  • 68