Dataflow stalls on Kafka to BigQuery pipeline when scaled past 1 worker

Question

I've got a Kafka-to-BigQuery Dataflow pipeline where I am consuming from multiple topics and using dynamic destinations to output to the appropriate BigQuery tables for each topic.

The pipeline runs smoothly when only one worker is involved.

However when it auto-scales or is manually configured to more than 1 worker, it completely stalls at the BigQuery write transform and data staleness grows and grows. The growth of data staleness seems to happen regardless of the amount of data streaming from Kafka. No table rows are inserted into BigQuery at this point.

No errors are reported to the job or worker logs.

Specifically, internal to the BigQuery write transform, it is stalling during a reshuffle before the write.

Within the Reshuffle that occurs during the BigQuery write transform, there is a GroupByKey where the pipeline stalls, as shown below:

I can see that the Window.into step above is working fine:

However, this problem is not isolated in the BigQuery to the write transform. If I add a Reshuffle step in the pipeline, for example after the "Extract from Kafka record" step, the problem of a stalling pipeline appears in this new Reshuffle step at the same GroupByKey.

I am using Beam SDK version 2.39.0.

The pipeline was designed following the example of the Kafka-to-BigQuery template found here. Below is an overview of the pipeline code. This has also been attempted with Fixed Windowing

        PCollection<MessageData> messageData = pipeline
            .apply("Read from Kafka",
                KafkaReadTransform.read(
                    options.getBootstrapServers(),
                    options.getInputTopics(),
                    kafkaProperties))
            .apply("Extract from Kafka record",
                ParDo.of(new KafkaRecordToMessageDataFn()));

        /*
         * Step 2: Validate Protobuf messages and convert to FailsafeElement
         */
        PCollectionTuple failsafe = messageData
            .apply("Parse and convert to Failsafe", ParDo.of(new MessageDataToFailsafeFn())
                .withOutputTags(Tags.FAILSAFE_OUT, TupleTagList.of(Tags.FAILSAFE_DEADLETTER_OUT)));

        /*
         * Step 3: Write messages to BigQuery
         */
        WriteResult writeResult = failsafe.get(Tags.FAILSAFE_OUT)
            .apply("Write to BigQuery", new BigQueryWriteTransform(project, dataset, tablePrefix));

        /*
         * Step 4: Write errors to BigQuery deadletter table
         */
        failsafe.get(Tags.FAILSAFE_DEADLETTER_OUT)
            .apply("Write failsafe errors to BigQuery",
                new BigQueryDeadletterWriteTransform(project, dataset, tablePrefix));

        writeResult.getFailedInsertsWithErr()
            .apply("Extract BigQuery insertion errors", ParDo.of(new InsertErrorsToFailsafeRecordFn()))
            .apply("Write BigQuery insertion errors",
                new BigQueryDeadletterWriteTransform(project, dataset, tablePrefix));

BigQueryWriteTransform, where the pipeline is stalling:

BigQueryIO.<FailsafeElement<MessageData, ValidatedMessageData>>write()
    .to(new MessageDynamicDestinations(project, dataset, tablePrefix))
    .withFormatFunction(TableRowMapper::toTableRow)
    .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
    .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
    .withFailedInsertRetryPolicy(InsertRetryPolicy.neverRetry())
    .withExtendedErrorInfo()
    .optimizedWrites()

Formatting function:

public static TableRow toTableRow(FailsafeElement<MessageData, ValidatedMessageData> failsafeElement) {
    try {
        ValidatedMessageData messageData = Objects.requireNonNull(failsafeElement.getOutputPayload());

        byte[] data = messageData.getData();
        String messageType = messageData.getMessageType();
        long timestamp = messageData.getKafkaTimestamp();

        return TABLE_ROW_CONVERTER_MAP.get(messageType).toTableRow(data, timestamp);
    } catch (Exception e) {
        log.error("Error occurred when converting message to table row.", e);
        return  null;
    }
}

KafkaReadTransform looks like this:

KafkaIO.<String, byte[]>read()
    .withBootstrapServers(bootstrapServer)
    .withTopics(inputTopics)
    .withKeyDeserializer(StringDeserializer.class)
    .withValueDeserializer(ByteArrayDeserializer.class)
    .withConsumerConfigUpdates(kafkaProperties);

I have also tried this with Fixed Windowing in place after the extraction of message data from the Kafka record, even though the template does not appear to use windowing:

PCollection<MessageData> messageData = pipeline
    .apply("Read from Kafka",
        KafkaReadTransform.read(
            options.getBootstrapServers(),
            options.getInputTopics(),
            kafkaProperties))
    .apply("Extract from Kafka record",
        ParDo.of(new KafkaRecordToMessageDataFn()))
    .apply("Windowing", Window.<MessageData>into(FixedWindows.of(WINDOW_DURATION))
        .withAllowedLateness(LATENESS_COMPENSATION_DURATION)
        .discardingFiredPanes());

I'm out of ideas and don't know DataFlow well enough to know how to diagnose this problem further.

Is the system latency increasing? An increasing system latency might suggest that there are retries present in the graph, which might indicate a problem with either your BigQuery Configuration or User Code — Seng Cheong, Jun 22 '22 at 20:57
@SengCheong yes, the system latency is increasing at the same rate as the data staleness. If what you are saying is true, it would be a problem with the user code. I wouldn't be surprised. Because as mentioned, if I put in my own Reshuffle step in before the BigQuery write transform, this user created Reshuffle step producing the same stalling effect. But I don't know where to begin looking. I've already nearly re-written the entire pipeline. — NanoTree, Jun 22 '22 at 21:04
You can look into the `MessageDynamicDestinations` and `toTableRow` functions for any code errors. You might want to ensure that the your `KafkaIO` configuration is valid too — Seng Cheong, Jun 23 '22 at 05:53
@SengCheong thanks. However, as I mentioned in my write up, if I place a Reshuffle step before the BigQuery step, the pipeline will stall in this new reshuffle step. So the problem seems to happen within Reshuffle. In addition, when I set Dataflow to run with only 1 worker, everything works as expected without any problem. So `MessageDynamicDestinations` and `toTableRow` appear to be unrelated. As far as KafkaIO config, I would expect it to fail for 1 worker the same as it does for 2 or more. But I will check to make sure I haven't missed something. — NanoTree, Jun 23 '22 at 13:12
From my experience, the reason you are seeing this "staleness" is because of retries - the fused stage is being retried for some reason(probably 1 of the component `PTransforms` has failed). You may want to consider adding some sort of logging `ParDo(s`) to verify your user code is transforming the Kafka Records/Messages correctly? You can also check Cloud Monitoring for any other possible error logs; the log console in Dataflow doesn't list all possible error messages — Seng Cheong, Jun 23 '22 at 14:18
I've discovered the cause. It turns out that it was a firewall configuration issue. My project uses a custom VPC and subnetwork configuration. It turns out that Dataflow workers are automatically given a network tag of `dataflow`. This tag was not included in the source tags for the firewall ingress rules, and so the worker VMs were not able to communicate with each other. It was [this comment](https://stackoverflow.com/questions/60059588/increasing-workers-causes-dataflow-job-to-hang-on-textio-write-executes-quickl#comment107153285_60059588) that tipped me off. — NanoTree, Jun 23 '22 at 16:53

Dataflow stalls on Kafka to BigQuery pipeline when scaled past 1 worker

0 Answers0