0

I get an error “Could not commit request due to validation error: INVALID_ARGUMENT: Pubsub publish requests are limited to 10MB, rejecting message over to avoid exceeding limit with byte64 request encoding” on my enrich step.

I use tutorial by Simo Ahava Install Snowplow On The Google Cloud Platform | Simo Ahava's blog

Error appears on dataflow step of beam-enrich worker. Error stopes all processes and data don’t insert in BQ.

Error log

{
“insertId”: “7514256621418980731:34459:0:179556”,
“jsonPayload”: {
“line”: “active_work_manager.cc:1564”,
“message”: “132593 Could not commit request due to validation error: INVALID_ARGUMENT: Pubsub publish requests are limited to 10MB, rejecting message over 7168K (size 7245K) to avoid exceeding limit with byte64 request encoding.”,
“thread”: “194”
},
“resource”: {
“type”: “dataflow_step”,
“labels”: {
“job_name”: “beam-enrich”,
“project_id”: “XXXXXXXXXX”,
“region”: “europe-central2”,
“job_id”: “2021-03-30_14_43_03-13952642482494084906”,
“step_id”: “”
}
},
“timestamp”: “2021-03-31T08:38:58.534286Z”,
“severity”: “ERROR”,
“labels”: {
“compute.googleapis.com/resource_name”: “beam-enrich-03301443-wkq8-harness-w1zs”,
“dataflow.googleapis.com/log_type”: “system”,
“dataflow.googleapis.com/job_id”: “2021-03-30_14_43_03-13952642482494084906”,
“dataflow.googleapis.com/region”: “europe-central2”,
“compute.googleapis.com/resource_type”: “instance”,
“compute.googleapis.com/resource_id”: “7514256621418980731”,
“dataflow.googleapis.com/job_name”: “beam-enrich”
},
“logName”: “projects/XXXXXXXXXX/logs/dataflow.googleapis.com%2Fshuffler”,
“receiveTimestamp”: “2021-03-31T08:39:21.828671489Z”
}
fedoz
  • 31
  • 3

1 Answers1

0

Pub/Sub has these hard resource limits: https://cloud.google.com/pubsub/quotas#resource_limits

Does your pipeline publish to Pub/Sub? You might need to decrease the size of your messages by splitting them or truncating them.

ningk
  • 1,298
  • 1
  • 7
  • 7
  • How I can splitting messages? – fedoz Apr 09 '21 at 18:24
  • Unfortunately I can't find requests like this. Maybe you know how I can find them? – fedoz Apr 09 '21 at 18:25
  • I think it depends on the structure of your message. If your message looks like a variable-length array of repeated structures, when published to Pub/Sub, you can split the array into fixed-length arrays that would not exceed the size limit and send them as multiple messages (you can also add a field telling how many messages are there for this original message). The same goes to huge plain text. You can split them into fixed-length text messages and send multiple Pub/Sub messages. Then in your Beam pipeline, you reconstruct them as a single object by grouping them by some unique id. – ningk Apr 12 '21 at 21:56