2

I've currently got a base Windows 2008 Server AMI that I created on Amazon EC2. I use it to create 20-30 EBS-based EC2 instances at a time for processing large amounts of data into PDFs for a client. However, once the data processing is complete, I have to manually connect to each machine and copy off the files. This takes a lot of time and effort, and so I'm trying to figure out the best way to use S3 as a centralised storage for the outputted PDF files.

I've seen a number of third party (commercial) utilities that can map S3 buckets to drives within Windows, but is there a better, more sensible way to achieve what I want? Having not used S3 before, only EC2, I'm not sure of what options are available, and I've not been able to find anything online addressing the issue of using S3 as centralised storage for multiple EC2 Windows instances.

Update: Thanks for suggestions of command line tools for using S3. Was hoping for something a little more integrated and less ad-hoc. Seeing as EC2 is closely related to S3 (S3 used to be the default storage mechanism for AMIs, etc), that there might be something neater/easier I could do. Perhaps even around Private Cloud Networks and EC2 backed S3 servers, etc, or something (an area I know nothing about). No other ideas?

pauldunlop
  • 484
  • 9
  • 20
  • 1
    You probably fixed this a long time ago, if not. Why not attach a EBS volume to one server and simply share it using a Windows net share to the other servers? – Yooakim Oct 24 '11 at 06:14
  • That's actually the idea I came up with in the end, although by this point we'd run out of time and just did it the hard way, manually copying stuff around machines. Because the EC2 machines all have internal IP addresses, and you can create groups of machines, this seems like the most logical solution. Thanks for the suggestion. Definitely the way I'll try next time. – pauldunlop Oct 24 '11 at 12:40

3 Answers3

1

I'd probably look for a command line tool. A quick search on Google lead me to a .Net tool:

http://s3.codeplex.com/

And a Java one:

http://www.beaconhill.com/opensource/s3cp.html

I'm sure there are others out there as well.

dana
  • 17,267
  • 6
  • 64
  • 88
  • The only thing I'd say about this is it a feels a bit adhoc. I would have thought/hoped that, as I'm using EC2, another Amazon cloud service, there may be better ways to achieve this. I'm assuming based on the small range of answers thus far, this isn't the case? – pauldunlop Dec 12 '10 at 19:22
  • 1
    Chops - I agree, you'd think there would be a better way. For now at least, it seems like you have to use some sort of S3 client to transfer data from your AMI to a shared S3 bucket. As an FYI, if you search for "ec2" on the S3 forum there is a lot of stuff out there (https://forums.aws.amazon.com/search.jspa?objID=f24&q=ec2&x=0&y=0). – dana Dec 14 '10 at 17:04
  • Cheers @dana. I'll have a look around. – pauldunlop Dec 15 '10 at 14:54
1

You could use an EC2 instance with EBS exported through samba which can act as a centralized storage that windows instances can map?

Krishna
  • 11
  • 1
  • 1
    That's an interesting idea, would need to investigate what the connectivity speeds are like across EC2 instances, and how easily they can communicate with one another. Ideally, would be nice for them to write the files across the internal IPs, but that's not something I've tried before. – pauldunlop May 03 '11 at 14:27
-2

this sounds very much like a hadoop/Amazon MapReduce job to me. Unfortunately, hadoop is best deployed on Linux:

Hadoop on windows server

I assume the software you use for pdf-processing is Windows only? If this is not the case, I'd seriously consider porting your solution to Linux.

Community
  • 1
  • 1
jvdbogae
  • 1,241
  • 9
  • 15
  • Unfortunately it's a totally custom piece of software that requires a full Windows .NET/IIS environment. We've no choice over that. The current setup we have works fine for the actual generation. It's purely an issue of finding a way to have all 30 machines write out to a single, centralised location to reduce post-processing labour time. – pauldunlop Dec 10 '10 at 15:42