0

In the pipeline, I'm reading from Pub/Sub and I'm attempting to write to spanner. Writing to BigTable works, but spanner is a better option for my needs.

In the image below I've expanded the relevant steps. In the top right corner is the "Debug spanner" step, which shows the proper messages via LOG. I'm quite confident that SpannerIO.Write sees those messages as well.

Even without this debugging step, the messages still don't get written to spanner.

There is no exception whatsoever, but as you can see further down, CreateDataflowView is "not started", and it occurs twice in the pipeline.

Reading from pubsub and writing to spanner with less steps works - I've used a similar code from the pipeline displayed below to test.

What could cause this?

(the image is stiched together, but it contains the entire subtree until the PDone step) pipeline

The spanner steps are created with this code:

    SpannerConfig spannerConfig = SpannerConfig.create()
            .withProjectId("X")
            .withInstanceId("X")
            .withDatabaseId("X")
            ;
    //spannerConfig.validate();//does not throw exception
    //SpannerAccessor accessor = spannerConfig.connectToSpanner();//does not throw exception
    PDone writtenToSpanner = encodedForSpanner.apply("write to spanner",
            SpannerIO.write()
                    .withSpannerConfig(spannerConfig)
                    .withBatchSizeBytes(0)
    );
Flavius
  • 13,566
  • 13
  • 80
  • 126
  • Why do you set the batch size to 0? Have you tried with a higher value? – Lara Schmidt Mar 14 '18 at 18:36
  • Do you have a dataflow job id? – Lara Schmidt Mar 14 '18 at 20:39
  • @LaraSchmidt Yes, I've tried many variations, including without that setting. A dataflow job ID is the one which you can cancel with the gcloud command? Yes, I do have one. – Flavius Mar 15 '18 at 08:06
  • @LaraSchmidt Sorry, I didn't see you're at google. This could help: 2018-03-14_02_54_40-3351769906679676708. Thanks – Flavius Mar 15 '18 at 08:19
  • Last finding: the code works with the direct runner. – Flavius Mar 15 '18 at 09:48
  • I'd suggest looking at the worker logs for the job. They can indicate errors in the case where a pipeline is stuck on a single location. Do you see any errors on there? I think this might give insight as to the problem. – Lara Schmidt Mar 15 '18 at 21:16
  • @Flavius, did you end up solving this problem? – Alex Hurst Mar 26 '19 at 04:10
  • @AlexHurst Not really. I don't recall the specifics. I ended up on the support system from Google and they confirmed it's a bug. But I since ended my work with the customer I was working for, and I "solved" the problem by restructuring the data flow. Sorry, but good luck! – Flavius Mar 26 '19 at 13:48

0 Answers0