1

I have a field in bytea format, and I'm trying to calculate how many zero bytes are in the field (postgresql).

Example String:

0x202ee0ed0000000000000000000000000000000000000000000000000000000000014370000000000000000000000000000000000000000000000000000000003f8affe7

I originally tried to do this by counting how often 00 occurred, but this can be inaccurate since that doesn't check if it's a byte or two 0s that happen to be next to each other.

I'm currently using this regex_replace (found from another question) to force a _ in between bytes so that I can then count 00 occurrences, but this slows down the query by multiple magnitudes vs a simple replace on the order of 100k-1m+ rows: regexp_replace(data::text, '(..)', E'\\1_', 'g') - Produces something like 20_2e_e0...

I'm wondering if anyone knows of a more performant way to count the number of zero or non-zero bytes in a bytea/string?

Michael S
  • 25
  • 4

1 Answers1

0

There is no very simple and efficient way in SQL, I think. The best I can come up with is:

SELECT cardinality(
          string_to_array(
             encode('\xDEADF00D0000', 'escape'),
             '\000'
          )
       ) - 1;

 ?column? 
══════════
        2
(1 row)
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • 1
    Hi, thanks! After posting this, I found a similar answer from random experimenting (where `data` is my bytea string): `SELECT COALESCE( (char_length( encode(data,'escape') ) - char_length(REPLACE(encode(data,'escape'),'\000','')) ) /4 ,0) AS zero_bytes` I need to learn more about how "escape" actually works to validate that it's correct, but I'll definitely see how performance compares to you solution as well. Thanks again! – Michael S Nov 06 '21 at 19:04
  • That is also correct. – Laurenz Albe Nov 06 '21 at 20:22