1

I have a bucket in GCP that has millions of 3kb files, and I want to copy them over to an S3 bucket. I know google has a super fast transfer service, however I am not able to use that solution to push data back to S3 with it.

Due to the amount of objects, running a simple gsutil -m rsync gs://mybucket s3://mybucket might not do the job because it will take at least a week to transfer everything.

Is there a faster solution than this?

dwardu
  • 573
  • 6
  • 22
  • 2
    Are you running gsutil locally? The job might go faster if you run it from either GCE or EC2. Alternately, it could be worth experimenting with the time it takes to bundle the files up into a tarball on the Google side, upload that to AWS, and unbundle it there. Also consider running many gsutils in parallel on different machines, each with a different prefix. – Brandon Yarbrough Jul 02 '18 at 19:13
  • So i've been running gsutil on a beefed up instance on GCP. I tried on EC2 and it's the same outcome. – dwardu Jul 02 '18 at 20:54

3 Answers3

1

On the AWS side, you may want to see if S3 Transfer Acceleration would help. There are specific requirements for enabling it and naming it. You would want to make sure the bucket was in a location close to where the data is currently stored, but that might help speed things up a bit.

chris
  • 36,094
  • 53
  • 157
  • 237
1

We got the same problem of pushing small files to S3. Compressing and storing it back does the same thing. It is the limits set to your account.

As mentioned in the documentation you need to open support ticket to increase your limits before you send burst of requests.

https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

It is NOT size of the file or size of all objects matters here. It is the number of files you have is the problem.

Hope it helps.

Kannaiyan
  • 12,554
  • 3
  • 44
  • 83
0

Personally I think the main issue that you're going to have is not so much the ingress rate to Amazon's S3 service but more so the network egress rate from Google's Network. Even if you enabled the S3 Transfer Acceleration service, you'll still be restricted by the egress speed of Google's Network.

There are other services that you can set up which might assist in speeding up the process. Perhaps look into one of the Interconnect solutions which allow you to set up fast links between networks. The easiest solution to set up is the Cloud VPN solution which could allow you to set up a fast uplink between an AWS and Google Network (1.5-3 Gbps for each tunnel).

Otherwise from your data requirements the transfer of 3,000 GB isn't a terrible amount of data and setting up a Cloud server to transfer data over the space of a week isn't too bad. You might find that by the time you set up another solution it may have been easier in the first place to just spin up a machine and let it run for a week.

ScottMcC
  • 4,094
  • 1
  • 27
  • 35