0

I'm trying to apply mutations (Increments) to Bigtable via Dataflow using cloud-bigtable-client (https://github.com/GoogleCloudPlatform/cloud-bigtable-client).

Here is a high-level summary of what my job does:

  PCollection<SomeData> somedata = ...;
  somedata.apply(ParDo.of(new CreateMutations()))
      .setCoder(new HBaseMutationCoder()).apply(CloudBigtableIO.writeToTable(config));
  // I don't think it is necessary to explicitly set Coder here; I tried both ways.

CreateMutations is a DoFn that looks like:

// c.element() is KV<String, Iterable<SomeData>>
public void processElement(ProcessContext c) {
  Increment mutation = new Increment(c.element().getKey().getBytes());
  for (SomeData data : c.element().getValue()) {
    // Obtain cf (String), qual (String), value (long) from data.
    // None of them is null. 
    mutation.addColumn(cf.getBytes(), qual.getBytes(), value);
  }
  c.output(mutation);
}

Surprisingly, the job fails when this DoFn is executed because elements cannot be encoded by HBaseMutationCoder. Here is a small portion of the stack trace:

(e8a8d266ed05e19f): java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Unable to encode element 'row=some_string, families={(family=a, columns={some_string/a:2:text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=1, some_string/a:8:text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=9620}), (family=m, columns={some_string/m:2:text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=1, some_string/m:8:text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=9620}}' with coder 'HBaseMutationCoder'.
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:160)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)
    at  ......

Notice that within the error message, it clearly shows that row, family,column qualifiers, and values are filled properly. This particular error message shows that it contains four cells to be incremented. I haven't had trouble using Delete's and Put's, but it's my first time to use Increment's -- is there anything else I need to populate other than row, family, qualifier, and value?

Any help will be really appreciated.

I also tried to use Put instead of Increment and it worked (it's the same code as above, except for two lines as marked with (*)).

// c.element() is KV<String, Iterable<SomeData>>
public void processElement(ProcessContext c) {
  Put mutation = new Put(c.element().getKey().getBytes()); //(*)
  for (SomeData data : c.element().getValue()) {
    // Obtain cf (String), qual (String), value (long) from data.
    // None of them is null. 
    mutation.addImmutable(cf.getBytes(), qual.getBytes(), Bytes.toBytes(value)); //(*)
  }
  c.output(mutation);
}

(I found a relevant question here: How to load data into Google Cloud Bigtable from Google BigQuery but the issue I am having doesn't seem to be caused by null values as all of row/column family/qualifier/value's are filled properly.)


update: here's full stack trace I got.

    (875583981e325b46): java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Unable to encode element 'row=some_string, families={(family=a, columns={some_string/a:8:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=940, some_string/a:2:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=2}), (family=m, columns={some_string/m:2:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=2, some_string/m:8:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=940}}' with coder 'HBaseMutationCoder'.
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:160)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:284)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext$1.outputWindowedValue(DoFnRunnerBase.java:508)
    at com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowsViaIteratorsDoFn.processElement(GroupAlsoByWindowsViaIteratorsDoFn.java:123)
    at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)
    at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
    at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)
    at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
    at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:221)
    at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:182)
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:284)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:220)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:170)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:192)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:172)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:159)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Unable to encode element 'row=some_string, families={(family=a, columns={some_string/a:8:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=940, some_string/a:2:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=2}), (family=m, columns={some_string/m:2:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=2, some_string/m:8:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=940}}' with coder 'HBaseMutationCoder'.
    at com.google.cloud.dataflow.sdk.util.UserCodeException.wrap(UserCodeException.java:35)
    at com.google.cloud.dataflow.sdk.util.UserCodeException.wrapIf(UserCodeException.java:40)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.wrapUserCodeException(DoFnRunnerBase.java:369)
    at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:51)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)
    at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
    at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)
    at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:158)
    ... 24 more
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Unable to encode element 'row=some_string, families={(family=a, columns={some_string/a:8:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=940, some_string/a:2:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=2}), (family=m, columns={some_string/m:2:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=2, some_string/m:8:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=940}}' with coder 'HBaseMutationCoder'.
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:160)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:450)
    at com.moloco.dataflow.bigtable.AptRecovery$UpdateCountPerCell.processElement(AptRecovery.java:78)
Caused by: java.lang.IllegalArgumentException: Unable to encode element 'row=some_string, families={(family=a, columns={some_string/a:8:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=940, some_string/a:2:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=2}), (family=m, columns={some_string/m:2:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=2, some_string/m:8:some_text/LATEST_TIMESTAMP/Put/vlen=8/seqid=0+=940}}' with coder 'HBaseMutationCoder'.
    at com.google.cloud.dataflow.sdk.coders.StandardCoder.getEncodedElementByteSize(StandardCoder.java:170)
    at com.google.cloud.dataflow.sdk.coders.StandardCoder.registerByteSizeObserver(StandardCoder.java:185)
    at com.google.cloud.dataflow.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:641)
    at com.google.cloud.dataflow.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:552)
    at com.google.cloud.dataflow.sdk.runners.worker.MapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(MapTaskExecutorFactory.java:351)
    at com.google.cloud.dataflow.sdk.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:61)
    at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:46)
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:158)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:450)
    at com.moloco.dataflow.bigtable.AptRecovery$UpdateCountPerCell.processElement(AptRecovery.java:78)
    at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)
    at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
    at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)
    at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:158)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:284)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext$1.outputWindowedValue(DoFnRunnerBase.java:508)
    at com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowsViaIteratorsDoFn.processElement(GroupAlsoByWindowsViaIteratorsDoFn.java:123)
    at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)
    at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
    at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)
    at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
    at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:221)
    at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:182)
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeW
Community
  • 1
  • 1

1 Answers1

1

OK, I see an IllegalArgumentException thrown by HBaseMutationCoder.java. Looking at https://github.com/apache/beam/blob/master/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseMutationCoder.java#L68 , INCREMENT does not work because it's not idempotent (but PUT is), which explains what you're seeing.

rf-
  • 1,443
  • 10
  • 15