I have a table with this structure:
user_id | message_id | content
1 | 1 | "I like cats"
1 | 1 | "I like dogs"
And a list of valid words in dictionary.txt
(or an external hive table), for example:
I,like,dogs,cats,lemurs
And my goal is to generate an word-count table for each user
user_id | "I" | "like" | "dogs" | "cats" | "lemurs"
1 | 2 | 2 | 1 | 1 | 0
This is what I tried so far:
SELECT user_id, word, COUNT(*)
FROM messages LATERAL VIEW explode(split(content, ' ')) lTable as word
GROUP BY user_id,word;