Learning to use perl like regular expressions in PIG Latin.

Question

Is there a way to extract certain words from a file in Pig Latin, eg: I want all words in a large file with tweets, that have a # in the beginning.

Input :  What a lovely day! #Sunshine
Output : Sunshine

score 0 · Answer 1 · answered May 23 '14 at 21:56

0

Okay, using FILTER worked for me: startswithHash = filter <> by <> matches '#.*' ;

answered May 23 '14 at 21:56

Kaizzen

score 0 · Accepted Answer · answered May 24 '14 at 03:49

0

This should work (extracts the last word with a # in front of it from your_field):

REGEX_EXTRACT(your_field, '.*#(\\w+)($|\\s.*)', 1)

answered May 24 '14 at 03:49

user2303197

2 Answers2