0

Hi I have a document uploaded to a Hive table named Data with sample lines like below:

He is a good boy and but his brother is a bad boy.
He is a naughty boy.

The table's schema is:

create table Data(
    document_data STRING)
row format delimited
fields terminated by '\n'
stored as textfile;

I want to write a query that counts the occurrences of just the words boy and naughty` and outputs them as such:

 boy 3
 naughty 1 
Ben Watson
  • 5,357
  • 4
  • 42
  • 65
JYOTI RANJAN PANDA
  • 122
  • 1
  • 1
  • 6

1 Answers1

0

Here we will make use of the LATERAL capability which enables a single row to be transformed into several.

SELECT
    word,
    COUNT(*)
FROM Data
WHERE
    word="boy" OR
    word="naughty"
LATERAL VIEW 
    explode(split(document_data, ' ')) lateralTable AS word GROUP BY word;

I adapted a version that I found at Word Count program in Hive.

Community
  • 1
  • 1
Ben Watson
  • 5,357
  • 4
  • 42
  • 65