Internal db logic/operation to group/compress result

Question

I have a CrateDB table storing various information for zipcodes. It contains around 30k zipcodes, and I need my query to return certain profiling information for all zipcodes at once. I understand that typically it wouldn't be feasible, but since I only need ballpark information and many zipcodes are consecutive, I think an optimization is possible.

For example, if I wanted to profile population, a grouped result such as this would work for me:

group 1 (0-1000): 00000-02000,02004-02010,02012
group 2 (1001-3000): ...
...

The populations and groups above are fake, but the idea should hold. Basically, group profiled category into buckets, assign zipcodes to correct bucket, and further reduce size by using range representation. I could settle for a predefined number of groups or have group buckets defined by request/query itself. This would hopefully reduce the response from something that would be too large for a single query to one that's manageable.

Is it possible to write a cratedb function to do something similar to avoid bandwidth issues from having this grouping done on a different service/container/vm?

score 0 · Answer 1 · answered Mar 11 '19 at 12:25

You could probably crate groups on the fly or as columns if you wish with a regex, I have done this on a 23M row table and group by that.

In my example regex grouping and AVG took around 30s, but this is very subjective to my hardware.

Something like this would probably work as a general pointer

SELECT avg (--yourColumn--), regexp_matches(--yourColumn--, '--your regex--','i')[1]
FROM "doc"."--yourTable--" 
group by regexp_matches(postcode, '--your regex--','i')[1]
order by regexp_matches(postcode, '--your regex--','i')[1]

You could use over windowed function but this doesn't yet have the full SQL support for partitioning etc.

Internal db logic/operation to group/compress result

1 Answers1