-1

I am trying to process the sample tweet and store the tweets based on the filtered criteria.

For example,

sample tweet:-

{"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a word about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...", "user_id": 450990391, "id": 252479809098223616, "created_date": "Sun Sep 30 2012"}

twitter = LOAD 'Tweet.json' USING JsonLoader('created_time:chararray, text:chararray, user_id:chararray, id:chararray, created_date:chararray');
grouped = GROUP twitter BY (text,id);
filtered =FOREACH grouped { row = FILTER $1 BY (text MATCHES '.*word.*'); GENERATE FLATTEN(row);}

it gets the complete tweets which matches with the word.

But I need to get the output as below:

(word)(all tweets of contained that word)

How can I achieve this?

Any help.

Mohan.V

nobody
  • 10,892
  • 8
  • 45
  • 63
Mohan.V
  • 141
  • 1
  • 1
  • 10

1 Answers1

0

After filtering add the word as a field say 'pattern' to the filtered relation and then group by that field.That will get you the word and a bag of tweets.

twitter = LOAD 'Tweet.json' USING JsonLoader('created_time:chararray, text:chararray, user_id:chararray, id:chararray, created_date:chararray');
grouped = GROUP twitter BY (text,id);
filtered =  FILTER $1 BY (text MATCHES '.*word.*');
newfiltered = FOREACH filtered GENERATE 'word' AS pattern,filtered.text;
final = GROUP newfiltered BY pattern;
DUMP final;
nobody
  • 10,892
  • 8
  • 45
  • 63