2

I have a 16 byte md5 hash that I need to "fold" into 4-byte data using XOR: {1st 4 bytes} XOR {2nd 4 bytes} XOR {3rd 4 bytes} XOR {4th 4 bytes}. I then need to convert the result to Hex form (8 char string).

I'm generating my hash like this (decoding to hex as it seems like it'll be easier to deal with):

SELECT decode(md5('test'), 'hex');

But that's as far as I got. I don't know the best way split the 16 byte hash into 4, 4-byte values, then XOR those 4-byte values.

Jason
  • 93
  • 10
  • http://www.postgresql.org/docs/8.0/static/functions-bitstring.html – Marc B Mar 16 '16 at 19:40
  • @MarcB: I think the idea is to use the bitwise XOR operator. But I'm struggling with how to formulate this. I'm currently looking at [this](http://stackoverflow.com/a/17764568/2077023) stackoverflow answer for reference. But am having trouble converting it to my example. – Jason Mar 16 '16 at 19:53
  • It turns out that the actual question has nothing to do with MD5, just about xor-ing bytes in Postgres. – zaph Mar 16 '16 at 21:10
  • @zaph Yes, that is correct. I thought my question is fairly clear on that. However, the title was a bit misleading, so I've updated that. – Jason Mar 16 '16 at 21:18
  • well, `#` is the xor operator in postgress, so technically you just need some string and/or bit operations to extract the individual bytes, then just xor'em all together. – Marc B Mar 16 '16 at 21:32

2 Answers2

0

Unfortunately the docs are a bit vague about what you could do with a bit string value, but mentions the substring function (sytax is on the string functions page), which can be used to extract parts from it:

select i1 # i2 # i3 # i4
from   cast('x' || md5('test') as bit(128)) bits,
       cast(substring(bits from 97 for 32) as int4) i1,
       cast(substring(bits from 65 for 32) as int4) i2,
       cast(substring(bits from 33 for 32) as int4) i3,
       cast(substring(bits from  1 for 32) as int4) i4

Note: lower bits have higher index in their bit string representation, f.ex.

select 3::bit(32)
-- will yield '00000000000000000000000000000011'
pozs
  • 34,608
  • 5
  • 57
  • 63
  • Funny, I came up with a completely different way of doing the exact same thing (I'll post it as an alternative answer). BTW, I missed a piece in my initial question. I needed the final result hex form (8 char string). I think to do that you would just call "to_hex(i1 # i2 # i3 # i4)". – Jason Mar 17 '16 at 12:45
  • Also, according to [the postgres message board](http://www.postgresql.org/message-id/2946.1289597909@sss.pgh.pa.us), your technique "is relying on some undocumented behavior of the bit-type input converter". – Jason Mar 17 '16 at 13:13
  • @Jason this is now a well-documented feature on [bit strings](http://www.postgresql.org/docs/current/static/functions-bitstring.html): *In addition, it is possible to cast integral values **to and from** type bit.* And yes, if you need the hex representation, you could use `to_hex(i1 # i2 # i3 # i4)` (or `lpad(to_hex(i1 # i2 # i3 # i4), 8, '0')` for fixed width) – pozs Mar 17 '16 at 13:24
  • Oh ok. I guess that post was from back in 2011. Thanks. If you'll do me a favor and add the "to_hex" portion to your response, I'll mark your answer as the accepted one. – Jason Mar 17 '16 at 13:30
0

After spending some time understanding this answer, I was able to come up with this:

CREATE OR REPLACE FUNCTION compressed_md5(var_txt TEXT) RETURNS TEXT
AS $$
DECLARE
  var_hash BYTEA;
  var_compressedHash BYTEA;
  var_offset INTEGER;
BEGIN
  var_hash := decode(md5(var_txt), 'hex');
  var_compressedHash := decode('00000000', 'hex'); -- prepopulate with some 4-byte data
  FOR var_offset IN 0..3 LOOP
    var_compressedHash := set_byte(var_compressedHash, var_offset,
                                   get_byte(var_hash, var_offset) #
                                   get_byte(var_hash, var_offset + 4) #
                                   get_byte(var_hash, var_offset + 8) #
                                   get_byte(var_hash, var_offset + 12));
  END LOOP;
  RETURN encode(var_compressedHash, 'hex');
END;
$$ LANGUAGE plpgsql;

SELECT compressed_md5('test');

Result:

"a35742cb"
Community
  • 1
  • 1
Jason
  • 93
  • 10