1

I'm trying to load a csv file in PigLatin. Record format is as follows: "ABBOTT,DEEDEE W",GRADES 9-12 TEACHER,"52,122.10",0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010

I tried the following code:

A = LOAD '/user/hduser/salaryTravel.csv' using PigStorage(',')  AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);

But the output is as follows:

("ABBOTT,DEEDEE W",,,122.10",0,)

The name field is read as separate fields since the name field contains a comma(','). How can I read this record?

Niyas
  • 505
  • 1
  • 6
  • 17
  • From your output it looks like name is one field because it is surrounded by quotes. If it was separate fields it would look like ("ABBOTT","DEEDEE W",,,"52","122.10", etcetcetc). – Scott Conway Aug 21 '15 at 12:18
  • I don't think so. The out put has 7 fields as expected although it is not correct. So I think "ABBOT and DEEDEE W" are two fields. Anyway, do you know how to read it as a single field? – Niyas Aug 21 '15 at 13:31
  • Ahh sorry I misread your output. No the problem is not with the name - it's all the other fields. I think that the lack of quotes is throwing Pig off. First try the same thing with quotes around all fields and if that does not work then put quotes around only the fields you have defined as chararray. Because I notice that both name and job are defined as chararray and I can see name but not job. – Scott Conway Aug 21 '15 at 17:40

1 Answers1

2

Would suggest to use CSVExcelStorage or CSVLoader API for loading the data.

REGISTER piggybank.jar;

A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage()  AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);

or

REGISTER piggybank.jar;

A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader()  AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);

Ref : REGEX_EXTRACT error in PIG, have shared few code samples.

Community
  • 1
  • 1
Murali Rao
  • 2,287
  • 11
  • 18