Start to try out the Apache Beam and try to use it to read and count HBase table. When try to read the table without the Count.globally, it can read the row, but when try to count number of rows, the process hung and never exit.
Here is the very simple code:
Pipeline p = Pipeline.create(options);
p.apply("read", HBaseIO.read().withConfiguration(configuration).withTableId(HBASE_TABLE))
.apply(ParDo.of(new DoFn<Result, String>() {
@ProcessElement
public void processElement(ProcessContext c) {
Result result = c.element();
String rowkey = Bytes.toString(result.getRow());
System.out.println("row key: " + rowkey);
c.output(rowkey);
}
}))
.apply(Count.<String>globally())
.apply("FormatResults", MapElements.via(new SimpleFunction<Long, String>() {
public String apply(Long element) {
System.out.println("result: " + element.toString());
return element.toString();
}
}));
when use Count.globally, the process never finish. When comment it out, the process print all the rows.
Anyy ideas?