2

I have BIGINT column in my Redshift table, and I want a query that will:

  1. Count how many times the value '1' appears in each bit position across the binary value in all the rows of this column
  2. Will show it in a way that I'll be able to take the x top bits_positions.

For example (I'm already writing the integer values as binary to simplify the example):

column
--------
11011110  = 222
00000000  = 0
11111100  = 252
00011000  = 24
11111100  = 252
00011000  = 24
11000010  = 194

76543210 <- bit_position

will return a table like:

bit_position   count
0              0
1              2
2              3
3              5
4              5
5              2
6              4
7              4

In this case I'll be able to get the top five bit_position: (3,4,6,7,2)

Note: I'll might have up to 64 bit_positions for a column.

JustinCase
  • 171
  • 1
  • 11

1 Answers1

3

You can use a bit-wise AND & to check for each position.

Here's an example going across rows:

SELECT SUM(CASE WHEN bit_col & 64 > 0 THEN 1 ELSE 0 END) "1000000"
     , SUM(CASE WHEN bit_col & 32 > 0 THEN 1 ELSE 0 END) "0100000"
     , SUM(CASE WHEN bit_col & 16 > 0 THEN 1 ELSE 0 END) "0010000"
     , SUM(CASE WHEN bit_col & 8 > 0 THEN 1 ELSE 0 END)  "0001000"
     , SUM(CASE WHEN bit_col & 4 > 0 THEN 1 ELSE 0 END)  "0000100"
     , SUM(CASE WHEN bit_col & 2 > 0 THEN 1 ELSE 0 END)  "0000010"
     , SUM(CASE WHEN bit_col & 1 > 0 THEN 1 ELSE 0 END)  "0000001"
FROM my_table
;
 1000000 | 0100000 | 0010000 | 0001000 | 0000100 | 0000010 | 0000001
---------+---------+---------+---------+---------+---------+---------
      11 |       8 |      11 |      13 |      11 |       9 |       8

To have the results in a single column you need to use union:

          SELECT 1 AS "col", SUM(CASE WHEN bit_col & 64 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 2 AS "col", SUM(CASE WHEN bit_col & 32 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 3 AS "col", SUM(CASE WHEN bit_col & 16 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 4 AS "col", SUM(CASE WHEN bit_col &  8 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 5 AS "col", SUM(CASE WHEN bit_col &  4 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 6 AS "col", SUM(CASE WHEN bit_col &  2 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 7 AS "col", SUM(CASE WHEN bit_col &  1 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
ORDER BY bit_count DESC
;
 position | bit_count
----------+-----------
        6 |         6
        7 |         6
        4 |         4
        5 |         4
        2 |         0
        3 |         0
        1 |         0

http://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html

EDIT: If you would like something more dynamic you will need to look into using a UDF. You could start with my f_bitwise_to_string UDF as a template and add what you need from there. https://github.com/awslabs/amazon-redshift-udfs/blob/master/scalar-udfs/f_bitwise_to_string.sql

Joe Harris
  • 13,671
  • 4
  • 47
  • 54
  • Thanks joe, the problem is, that as noted, I'll might have up to 64 bits for the column, and I'm looking for something more dynamic (so it will also be possible to cast over more than one column). The union for the transpose does the job, but also in that case, I look for something more dynamic if possible – JustinCase Jul 27 '17 at 07:04
  • If you would like something more dynamic you will need to look into using a UDF. You could start with my `f_bitwise_to_string` UDF as a template and add what you need from there. https://github.com/awslabs/amazon-redshift-udfs/blob/master/scalar-udfs/f_bitwise_to_string.sql – Joe Harris Jul 27 '17 at 12:43