1

I'm storing a Hive table externally, and it's a pretty simple data structure. The table is created in Hive as

(user string, names array<string>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\001'
STORED AS TEXTFILE

(I've tried other delimiters, too).

In Pig, I can't seem to figure out the right way to use a bag or tuple to just load a simple array! Here's what I've tried without luck:

users = load '<file>' using PigStorage() AS (user:chararray, names:bag{tuple(name:chararray)})

users = load '<file>' using PigStorage() AS (user:chararray, names:chararray)

and some other things, but the best I've gotten was to have them loaded as a single string with the delimiter removed (which doesn't help). How do I just load a variable-length array of strings?

thanks

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
JayC
  • 238
  • 2
  • 9

1 Answers1

1

Let say you have the following data in the /user/hdfs/tester/ip/test file on HDFS

cat test:
1   A,B
2   C,D,E,F
3   G
4   H,I,J,K,L,M

In Pig Mapreduce do the following:

a = LOAD '/user/hdfs/tester/ip/test' USING PigStorage('\t') as (id:INT,names:chararray);
b = FOREACH a GENERATE id, FLATTEN(TOBAG(STRSPLIT(names,','))) as value:tuple(name:CHARARRAY);

The first column is id and value is the tuple of CHARARRAY.

New Coder
  • 499
  • 4
  • 22
  • It doesn't seem to like FLATTEN(TOBAG()), but string splitting was the obvious answer that I was too boneheaded to use. Thank you! I still ended up with a problem of then not being able to define the schema to inject this into Mongo, and had to add a value to the string 'keys' so that I could define the schema with a bag of tupples and thus inject it to mongo (since the connector really prefers a schema). Silly PIG needs basic arrays! (in my opinion) – JayC Aug 11 '15 at 18:27
  • I have my data in this format: `{"id": "59b6808364fdb09cde10ad3b","balance": "$1,972.02","age": 35,"eyeColor": "green","tags": ["aute","nostrud","pariatur","adipisicing","irure"]}`. What must be the related PIG script for loading tags column? I tried `tags:chararray` but did not help. – nishant Sep 11 '17 at 14:14