Move files from EC2 to S3 and then delete from EC2

Question

I'm migrating files from one remote server to S3. There are about 10k files (all accessible via http URLs from the remote server). The total size is about 300GB (no individual file is more than 1GB). I'm trying to figure out the most efficient way to make this migration. So far I have a EC2 instance and I have the S3CMD installed; PHP-SDK, I have a text file with all the URL's as well. I'm able to move files from EC2 to S3 without any issue. but the problem is if I download everything in EC2 I run out of storage. Is there a way where I can download a file in EC2 (maybe look in the txt file) move it to S3 (using S3CMD) and then delete the file from EC2 before I go to the next file.

Ideally I would want to download everything straight to S3 from the remote location, but I don't think that is possible, unless someone here says it is.

Thanks in advance for the help.

greg_diesel · Accepted Answer · 2015-05-28T20:05:21.777

1

I don't see what OS your current ec2 instance is running. But if it is linux you could use S3fs
https://github.com/s3fs-fuse/s3fs-fuse/wiki/Fuse-Over-Amazon

that will allow you to mount your bucket like a local drive/folder. Then you can simple move the files there. It will upload them to the bucket in the background. I would moving them in some kind of batches to make it easy to track. Moving them would remove them from your local file system after uploading. You could also just copy them to the bucket this way. When done you could do a simple comparison to make sure the same files exist in both folders and then you are done.

EDIT based on question asked in comment for clarity

On the remote machine, setup Fuse with your AWS credentials.
Mount your S3 bucket. It will look like a local folder structure in Ubuntu.
Lets say your current files are in
/var/myfiles/folder1 and /var/myfiles/folder2
mount your S3 bucket to /mybucket
mv /var/myfiles/folder1 /mybucket/folder1

Again, I would move them in batches and make sure the folders match up before continuing.

END EDIT

If you EC2 instance is windows then there are other ways to mount an S3 bucket as a local drive. Then the same process could take place.

edited May 28 '15 at 20:05

answered May 28 '15 at 18:46

greg_diesel

2,955
1
15
24

I am using Ubuntu. so are you saying once I mount my bucket (using Fuse), i will be able to directly download it to that folder (S3)? – chips May 28 '15 at 19:25
@chips . Yes you can just 'move' the files from the local disk folder to the newly mounted s3 bucket folder. There is still going to be a lot of data transfer but s3fs will handle that for you. Added more details to my answer above to make it clearer. – greg_diesel May 28 '15 at 20:05
commenting on your edit. I think the problem with moving is the storage space. I can't download stuff to my EC2 in the first place (batch would be time consuming) but your original answer is still valid if I can directly download there. so for example cd /mybucker ; curl -O URL – chips May 28 '15 at 20:10
If the problem is running out of disk space, you can expand the available space - EBS now supports up to 16TB volumes. You could dowload everything, then use the AWS cli to push it up to S3 - aws s3 sync will do it. – chris May 28 '15 at 20:51
If you are going to use s3fs, only us it to move files to and from s3. Any applications that read files at a block level may not work well with s3fs. – datasage May 29 '15 at 01:37

Move files from EC2 to S3 and then delete from EC2

1 Answers1

Linked