0

I have a text input with '|' separator as

0.0000|25000|                    |BM|BM901002500109999998|SZ

which I split using PigStorage

A = LOAD '/user/hue/data.txt' using PigStorage('|');

Now I need to split the field BM901002500109999998 into different fields based on their position , say 0-2 = BM - Field1 and like wise. So after this step I should get BM, 90100, 2500, 10, 9999998. Is there any way in Pig script to achieve this, otherwise I plan to write an UDF and put separator on required positions.

Thanks.

Abhi
  • 6,471
  • 6
  • 40
  • 57
  • 1
    are you looking for substring ? http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/SUBSTRING.html – kanchirk May 19 '15 at 15:49

2 Answers2

3

You are looking for SUBSTRING:

A = LOAD '/user/hue/data.txt' using PigStorage('|');
B = FOREACH A GENERATE SUBSTRING($4,0,2) AS FIELD_1, SUBSTRING($4,2,7) AS FIELD_2, SUBSTRING($4,7,11) AS FIELD_3, SUBSTRING($4,11,13) AS FIELD_4, SUBSTRING($4,13,20) AS FIELD_5;

The output would be:

dump B;
(BM,90100,2500,10,9999998)

You can find more info about this function here.

Balduz
  • 3,560
  • 19
  • 35
2

I think that it will be much more efficient to use the built in UDF REGEX_EXTRACT_ALL.
You can get some idea of how to use this UDF from:

Community
  • 1
  • 1
Zach Beniash
  • 302
  • 1
  • 9