1

I am using Cloudera CDH3 Pseudo mode Cluster. In CDH3 The Pig Version is 0.8

I would like to read a CSV or Excel File Using Pig script

I downloaded piggybank-0.11.0.jar and kept it inside /home/cloudera/ directory

my csv file is like this..

id    name       city
100   surrender  Chennai
101   raja       Chennai

My Pig script is below

REGISTER '/home/cloudera/piggybank-0.11.0.jar';

A = LOAD '/user/cloudera/inputfiles/sample_rec.csv' USING CSVExcelStorage(',') AS (id:int,name:chararray,city:chararray);
B = DUMP A;

But I am getting below error

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve CSVExcelStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.

Do I need to download piggbank jar for pig 0.8 version?

What is wrong here? Is it possible to read csv file in pig 0.8 version?

Surender Raja
  • 3,553
  • 8
  • 44
  • 80
  • 1. Unjar and see if you are having CSVExcelStorage class. 2. "," is the default delimiter for CSVExcelStorage, we need not specify the same. – Murali Rao Jul 16 '15 at 18:41
  • 3. Specify complete package name while using CSVExcelStorage() : USING org.apache.pig.piggybank.storage.CSVExcelStorage() – Murali Rao Jul 16 '15 at 18:48

1 Answers1

2

Specify complete package name while using CSVExcelStorage() :

USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS ...

Other Checks :

  1. Unjar and see if you are having CSVExcelStorage class.

  2. "," is the default delimiter for CSVExcelStorage, we need not specify the same.

Other alternative is to make use of CSVLoader

 A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (f1,f2,f3);

Ref : http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/CSVLoader.html

Murali Rao
  • 2,287
  • 11
  • 18
  • Ok.. I tried that , but it gives some junk records when I dump the ouput I am using Pig 0.8 inside cdh3.. but I am also using piggybank-0.11.0.jar . Is that the problem, CSVExcelStorage is available with Pig 0.8? – Surender Raja Jul 17 '15 at 04:02
  • @SurenderRaja: Can you use CSVLoader instead ? Ref : http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/CSVLoader.html – Murali Rao Jul 17 '15 at 05:05
  • ok..i am trying the below code Input(s): Successfully read 52 records (9205 bytes) from: "/user/cloudera/inputfiles/sample_rec.csv" Output(s): Successfully stored 52 records (1171 bytes) in: "hdfs://localhost/tmp/temp1988488632/tmp-1068001496" (,,) (,���o8u����+�`�ӡ���`��B[��lC|�,) (,,) (,,) (,;�,) (,,) – Surender Raja Jul 21 '15 at 17:09
  • ok..i am trying the below codeREGISTER '/home/cloudera/surender/mapreducejars/piggybank-0.11.0.jar'; A = LOAD '/user/cloudera/inputfiles/sample_rec.csv' USINGorg.apache.pig.piggybank.storage.CSVLoader() AS(id:int,name:chararray,city:chararray); dump A; Input(s): Successfully read 52 records (9205 bytes) from: "/user/cloudera/inputfiles/sample_rec.csv" Output(s) Successfully stored 52 records (1171 bytes)in: "hdfs://localhost/tmp/temp1988488632/tmp-1068001496" (,���o8u����+�`�ӡ���`��B[��lC|�,) – Surender Raja Jul 21 '15 at 17:16
  • My question is whether '/home/cloudera/surender/mapreducejars/piggybank-0.11.0.jar' works in Cloudera CDH3 or not? – Surender Raja Jul 21 '15 at 17:17