0

I am using apache beam 2.22.0 (java sdk) and want to log metrics and write them to a GCS bucket after a batch pipeline finishes execution. I have tried using result.waitUntilFinish() followed by the intended code:

  1. DirectRunner- GCS object is created as expected and the logs appear on the console
  2. DataflowRunner- GCS object is created but logs (post pipeline exec) don't appear on stackdriver

Problem: When a GCS template is created for the same, Neither the GCS object is created nor logs appear using the template.

user2179539
  • 116
  • 1
  • 8

2 Answers2

1

what you are doing is the correct way of getting a signal for when the pipeline is done. There is no direct API in Apache Beam that allows for getting that signal within the running pipeline aside from wait_until_finish().

For your logging problem, you need to use the Cloud Logging API in your code. This is because the pipeline is submitted to the Dataflow service and runs in GCE VMs which logs to Cloud Logging. However, the code outside of your pipeline runs locally.

See Perform action after Dataflow pipeline has processed all data for a little more information.

Cubez
  • 878
  • 5
  • 11
  • Yes I agree about the cloud logging API. About the code that is outside the pipeline, that is after ```result.waitUntilFinish()``` . My observation is it is never invoked if I create and use a GDF template. Though it works fine with Dataflow and DirectRunner. – user2179539 Aug 05 '20 at 07:16
0

It is possible to export the logs from your Dataflow job to Google Cloud Storage, Big Query or PubSub. In order to do that, you can use Cloud Logging Console, Cloud Logging API or gcloud logging to export the desired metrics to a specific sink.

In summary, to use the log export:

  1. Create a sink, selecting Google Cloud Storage as the Sink Service( or one of the desired other options).
  2. Within the sink, create a query to filter your logs (Optional)
  3. Export destination

Afterwards, every time Cloud Logging receives new entries it will add them to the sink, only the new entries.

While you did not mention if you are using custom metrics, I should point that you should follow the Metrics naming rules, here. Otherwise, it won't show up in StackDriver.

Alexandre Moraes
  • 3,892
  • 1
  • 6
  • 13
  • Yes, I am using custom metrics and the issue is not with them showing up or logs in the pipeline. The issue is the code after ```result.waitUntilFinish()``` is not executed when I create and use a GDF template. – user2179539 Aug 05 '20 at 07:24
  • @user2179539, is it working now? If not, can you copy your code to your question? – Alexandre Moraes Aug 13 '20 at 07:19
  • no, it doesn't work, sorry I haven't found time to get back here. I have used the approach at [link](https://rmannibucau.metawerx.net/post/apache-beam-initialization-destruction-task) – user2179539 Sep 11 '20 at 06:39