First some background information on pricing:
Google has pretty good documentation about how to ingest data into GCS. From that guide:
Today, when you move data to Cloud Storage, there are no ingress traffic charges. The gsutil tool and the Storage Transfer Service are both offered at no charge. See the GCP network pricing page for the most up-to-date pricing details.
The "network pricing page" just says:
[Traffic type: Ingress] Price: No charge, unless there is a resource such as a load balancer that is processing ingress traffic. Responses to requests count as egress and are charged.
There is additional information on the GCS pricing page about your idea to use a GCE VM to write to GCS:
There are no network charges for accessing data in your Cloud Storage buckets when you do so with other GCP services in the following scenarios:
- Your bucket and GCP service are located in the same multi-regional or regional location. For example, accessing data in an
asia-east1
bucket with an asia-east1
Compute Engine instance.
From later in that same page, there is also information about the pre-request pricing:
Class A Operations: storage.*.insert[1]
[1] Simple, multipart, and resumable uploads with the JSON API are each considered one Class A operation.
The cost for Class A operations is per 10,000 operations, and is either $0.05 or $0.10 depending on the storage type. I believe you would only be doing 1 Class A operation (or at most, 1 Class A operation per file that you upload), so this probably wouldn't add up to much usage overall.
Now to answer your question:
For your use case, it sounds like you want to have the files in the tarball be individual files in GCS (as opposed to just having a big tarball stored in one file in GCS). The first step is to untar it somewhere, and the second step is to use gsutil cp
to copy it to GCS.
Unless you have to (i.e. not enough space on the machine that holds the tarball now), I wouldn't recommend copying the tarball to an intermediate VM in GCE before uploading to GCE, for two reasons:
gsutil cp
already handles a bunch of annoying edge cases for you: parallel uploads, resuming an upload in case there's a network failure, retries, checksum comparisons, etc.
- Using any GCE VMs will add cost to this whole copy operation -- costs for the disks plus costs for the VMs themselves.
If you want to try the procedure out with something lower-risk first, make a small directory with a few megabytes of data and a few files and use gsutil cp
to copy it, then check how much you were charged for that. From the GCS pricing page:
Charges accrue daily, but Cloud Storage bills you only at the end of the billing period. You can view unbilled usage in your project's billing page in the Google Cloud Platform Console.
So you'd just have to wait a day to see how much you were billed.