0

I am struggling to figure out what how I can resolve an issue I am seeing with this data flow job. I saw a similar thread on the apache beam archives question thread but I did not quite understand how to use this information.

Essentially data is being streamed into Big Query (which works), I am trying to write these BQ rows into spanner in the same dataflow job which raises the following runtime exception:

    java.lang.IllegalArgumentException: Attempted to get side input window for GlobalWindow from non-global WindowFn
    org.apache.beam.sdk.transforms.windowing.PartitioningWindowFn$1.getSideInputWindow(PartitioningWindowFn.java:47) ....

The relevant section of the data flow graph can be seen here data flow graph and the code I am using to write to spanner is here:

sensorReports
        .apply("WindowSensorReportByMonth",
                Window.<TableRow>into(FixedWindows.of(Duration.standardMinutes(5))).withAllowedLateness(Duration.ZERO).discardingFiredPanes()
                        .triggering(AfterProcessingTime.pastFirstElementInPane()
                                .plusDelayOf(Duration.standardMinutes(1)))
                        .discardingFiredPanes())
        .apply("CreateSensorReportMutation", ParDo.of(new RowToMutationTransform()))
        .apply("Write to Spanner",
                SpannerIO.write()
                        .withDatabaseId(propertiesUtils.getSpannerDBId())
                        .withInstanceId(propertiesUtils.getSpannerInstanceId())
                        .withProjectId(propertiesUtils.getSpannerProjectId())
                        .withBatchSizeBytes(0));
C McShane
  • 3
  • 4

1 Answers1

1

SpannerIO.write() internally reads the DB schema using a global window and uses this as a side input, so your non-global-windowed Mutations are clashing with it.

You could put all your Mutations into a global window before passing to Spanner.IO.write()

.apply("To Global Window", Window.into(new GlobalWindows()))

but in BEAM versions 2.5-2.8, this will result in either an error or nothing ever being written (as SpannerIO never supported streaming pipelines).

Edited answer:

However, BEAM before version 2.9.0 does not support streaming pipelines. V2.4 and earlier did, provided you don't pass a windowed PCollection to it.

You will be pleased to hear that all is fixed in version 2.9 (release in progress) where the SpannerIO both supports streaming writes and handles the windowing correctly.

RedPandaCurios
  • 2,264
  • 12
  • 20
  • Hi Thanks for your help! I tried your suggestion anyway and it gave this exception. ```java.lang.IllegalStateException: GroupByKey cannot be applied to non-bounded PCollection in the GlobalWindow without a trigger. Use a Window.into or Window.triggering transform prior to GroupByKey``` . Also, this [question] (https://stackoverflow.com/questions/51480770/streaming-mutationgroups-into-spanner) seems to suggest that streaming capabilities have been merged into the latest version of Apache Beam (although I am not sure which version). – C McShane Dec 09 '18 at 21:20
  • Agh, sorry, yes original answer won't work, you also need to trigger the data for GroupByKey. Edited. The PR mentioned in that SO question is actually mine, and is part of the 2.9.0 release. If you want to try it, you can do a local BEAM build on the 2.9 branch, or wait a week or so for 2.9.0 to be released. – RedPandaCurios Dec 10 '18 at 07:15