9

I have a batch job that takes a couple of hours to run. How can I run this in a serverless way on Google Cloud?

AppEngine, Cloud Functions, and Cloud Run are limited to 10-15 minutes. I don't want to rewrite my code in Apache Beam.

Is there an equivalent to AWS Batch on Google Cloud?

Dustin Ingram
  • 20,502
  • 7
  • 59
  • 82
Lak
  • 3,876
  • 20
  • 34

9 Answers9

11

Note: Cloud Run and Cloud Functions can now last up to 60 minutes. The answer below remains a viable approach if you have a multi-hour job.

Vertex AI Training is serverless and long-lived. Wrap your batch processing code in a Docker container, push to gcr.io and then do:

gcloud ai custom-jobs create \
  --region=LOCATION \
  --display-name=JOB_NAME \
  --worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,executor-image-uri=EXECUTOR_IMAGE_URI,local-package-path=WORKING_DIRECTORY,script=SCRIPT_PATH

You can run any arbitrary Docker container — it doesn’t have to be a machine learning job. For details, see:

https://cloud.google.com/vertex-ai/docs/training/create-custom-job#create_custom_job-gcloud

Today you can also use Cloud Batch: https://cloud.google.com/batch/docs/get-started#create-basic-job

Community
  • 1
  • 1
Lak
  • 3,876
  • 20
  • 34
5

Google Cloud does not offer a comparable product to AWS Batch (see https://cloud.google.com/docs/compare/aws/#service_comparisons).

Instead you'll need to use Cloud Tasks or Pub/Sub to delegate the work to another product, such as Compute Engine, but this lacks the ability to do this in a "serverless" way.

Dustin Ingram
  • 20,502
  • 7
  • 59
  • 82
4

Finally Google released (in Beta for the moment) Cloud Batch which does exactly what you want. You push jobs (containers or scripts) and it runs. Simple as that. https://cloud.google.com/batch/docs/get-started

Lak
  • 3,876
  • 20
  • 34
Adrien QUINT
  • 589
  • 4
  • 18
2

I have faced the same problem. in my case I went for:

  1. Cloud Scheduler to start the job by pushing to Pub/Sub.
  2. Pub/Sub triggers Cloud Functions.
  3. Cloud Functions mounting a Compute Engine instance.
  4. Compute Engine runs the batch workload and auto kills the instance once it’s done. You can read my post on medium: https://link.medium.com/1K3NsElGYZ

It might help you get started. There's also a follow up post showing how to use a Docker container inside the Compute Engine instance: https://medium.com/google-cloud/serverless-batch-workload-on-gcp-adding-docker-and-container-registry-to-the-mix-558f925e1de1

mesmacosta
  • 466
  • 3
  • 10
2

This answer to a How to make GCE instance stop when its deployed container finishes? will work for you as well:

In short:

  • First dockerize your batch process.
  • Then, create an instance:
    • Using a container-optmized image
    • And using a Startup script that pulls your docker image, runs it, and shutdown the machine at the end.
Iñigo González
  • 3,735
  • 1
  • 11
  • 27
1

You can use Cloud Run. At the time of writing this, the timeout of Cloud Run (fully managed) is increased to 60 minutes, but in beta.

https://cloud.google.com/run/docs/configuring/request-timeout

Important: Although Cloud Run (fully managed) has a maximum timeout of 60 minutes, only timeouts of 15 minutes or less are generally available: setting timeouts greater than 15 minutes is a Beta feature.

Suds
  • 156
  • 6
0

Another alternative for batch computing is using Google Cloud Lifesciences.

An example application using Cloud Life Sciences is dsub.

Or see the Cloud Life Sciences Quickstart documentation.

indraniel
  • 577
  • 1
  • 4
  • 8
0

I found myself looking for a solution to this problem and built something similar to what mesmacosta has described in a different answer, in the form of a reusable tool called gcp-runbatch.

If you can package your workload into a Docker image then you can run it using gcp-runbatch. When triggered, it will do the following:

  1. Create a new VM
  2. On VM startup, docker run the specified image
  3. When the docker run exits, the VM will be deleted

Some features that are supported:

  • Invoke batch workload from the command line, or deploy as a Cloud Function and invoke that way (e.g. to trigger batch workloads via Cloud Scheduler)
  • stdout and stderr will be piped to Cloud Logging
  • Environment variables can be specified by the invoker, or pulled from Secret Manager

Here's an example command line invocation:

$ gcp-runbatch \
  --project-id=long-octane-350517 \
  --zone=us-central1-a \
  --service-account=1234567890-compute@developer.gserviceaccount.com \
  hello-world
Successfully started instance runbatch-38408320. To tail batch logs run:
CLOUDSDK_PYTHON_SITEPACKAGES=1 gcloud beta --project=long-octane-350517
logging tail 'logName="projects/long-octane-350517/logs/runbatch" AND
resource.labels.instance_id="runbatch-38408320"' --format='get(text_payload)'
0

GCP launched their new "Batch" service in July '22. It basically Compute Engine packaged with some utilities to easily productionize a batch job -- including defining required resources, executables (script or container-based), and define a run schedule.

Haven't used it yet, but seems like a great fit for batch jobs that take over 1 hr.

Nathan Gould
  • 7,995
  • 2
  • 17
  • 15