I need to save HLL sketches into BigQuery from ApacheBeam.
I found some extension library for Apache-Beam that does it:
But I can't find a way to save the sketch itself to BigQuery. to be able to use it later with merge function and other functions by some time sliding: see this link
my code:
.apply("hll-count", Combine.perKey(ApproximateDistinct.ApproximateDistinctFn
.create(StringUtf8Coder.of())))
.apply("reify-windows", Reify.windows())
.apply("to-table-row", ParDo.of(new DoFn< ValueInSingleWindow<KV<GroupByData,HyperLogLogPlus>>, TableRow>() {
@ProcessElement
public void processElement(ProcessContext processContext) {
ValueInSingleWindow<KV<GroupByData,HyperLogLogPlus>> windowed = processContext.element();
KV<GroupByData, HyperLogLogPlus> keyData = windowed.getValue();
GroupByData key = keyData.getKey();
HyperLogLogPlus hyperLogLogPlus = keyData.getValue();
if (key != null) {
TableRow tableRow = new TableRow();
tableRow.set("country_code",key.countryCode);
tableRow.set("event", key.event);
tableRow.set("profile", key.profile);
tableRow.set("occurrences", hyperLogLogPlus.cardinality());
I just found how to do hyperLogLogPlus.cardinality()
but how can write the buffer itself, in way I can run on it later merge function, in BiGQuery.
Using hyperLogLogPlus.getBytes
also didn't work for merge.