In Trino, I'm getting the error message Cannot deserialize HyperLogLog
:
I have a query on Snowflake, doing the following:
select
__TENANT_ID
hll_accumulate(VISITOR_ID) as visitor_hll
from
[table]
where
[stuff]
group by
1;
The visitor_hll is being written to a column of type BINARY(8388608).
I then have a process that copies this data onto S3 Parquet, where I query it via Trino. When I try to perform hyperloglog operations on the field, such as
select
merge(cast(visitor_hll as hyperloglog)) as bsi_hll
from
[table]
I get the aforementioned error.
What can I do in order to consume the HLL data created in Snowflake?
I searched for the error message that I got, and the only results on Google are the source code for the HLL function on Airlift.
I also saw that Snowflake says "For integration with external tools, Snowflake supports converting states from the BINARY format to an OBJECT (which can be printed and exported as JSON), and vice versa." (see HLL_EXPORT). This returns a JSON object, but on the S3 side of things, I don't see any way of importing this back into a HLL.