Using Hive, I want to hash an entire row in the query.
I tried the following (don't mind the ${xxx}, query is built from a bash script):
SELECT md5(*) FROM ${DATABASE_NAME_SUFFIXE}.${DATABASE_PREFIXE}_${TABLE_NAME} WHERE
${TABLE_DATE_FIELD} <= '${LIMIT_DATE}' ORDER BY ${CREATION_DATE_FIELD} DESC LIMIT 1
This returns the following error:
Line 1:7 Wrong arguments 'md5': No matching method for class org.apache.hadoop.hive.ql.udf.UDFMd5 with (bigint, int, varchar(128), timestamp, timestamp, varchar(64), varchar(64), varchar(64), int, bigint, int, varchar(50), varchar(255), bigint, timestamp, timestamp, varchar(64), bigint, timestamp, timestamp, varchar(64), int, int, char(38), varchar(40), varchar(1)). Possible choices: FUNC(binary) FUNC(string)
If I understand correctly from the error and from the documentation of md5 function, I need to pass binary or string. How can I achieve that ?
Edit: also tried:
SELECT md5(SELECT * FROM ${DATABASE_NAME_SUFFIXE}.${DATABASE_PREFIXE}_${TABLE_NAME} WHERE ${TABLE_DATE_FIELD} <= '${LIMIT_DATE}' ORDER BY ${CREATION_DATE_FIELD} DESC LIMIT 1)
which returns
cannot recognize input near 'SELECT' '*' 'FROM' in function specification