1

I am using IBM BigInsights. When I execute the DUMP command in Pig Grunt shell, I am not getting any result.

Sample Input file:

s_no,name,DOB,mobile_no,email_id,country_code,sex,disease,age
11111,bbb1,12-10-1950,1234567890,bbb1@xxx.com,1111111111,M,Diabetes,78
11112,bbb2,12-10-1984,1234567890,bbb2@xxx.com,1111111111,F,PCOS,67
11113,bbb3,712/11/1940,1234567890,bbb3@xxx.com,1111111111,M,Fever,90
11114,bbb4,12-12-1950,1234567890,bbb4@xxx.com,1111111111,F,Cold,88
11115,bbb5,12/13/1960,1234567890,bbb5@xxx.com,1111111111,M,Blood Pressure,76

INFO  [JobControl] org.apache.hadoop.mapreduce.lib.input.FileInputFormat     - Total input paths to process : 1

My code is as follow:

    A = LOAD 'healthcare_Sample_dataset1.csv' as(s_no:long,name:chararray,DOB:datetime,mobile_no:long,email_id:chararray,country_code:long,sex:chararray,disease:chararray,age:int);
B = FOREACH A GENERATE name;
C = LIMIT B 5;
DUMP C;

Kindly help me to resolve this.

Thanks and Regards!!!

Ananth Francis
  • 141
  • 1
  • 4
  • 12

3 Answers3

0

From your script I can see that you are using CSV File. If you are working with CSV File then you should use CSVLoader() in your pig script. Your script should be like this:

--Register piggybank jar which contains UDF of CSVLoader
REGISTER piggybank.jar

-- Define the UDF
define CSVLoader org.apache.pig.piggybank.storage.CSVLoader();

--Load data using CSVLoader

A = load '/user/biadmin/test/CBTTickets.csv' using CSVLoader AS (
                    Type:chararray,
                    Id:int,
                    Summary:chararray,
                    OwnedBy:chararray,
                    Status:chararray,
                    Prio‌​rity:chararray,
                    Severity:chararray,
                    ModifiedDate:datetime,
                    PlannedFor:chararray,
                    Time‌​Spent:int);


B = FOREACH A GENERATE Type; 
C = LIMIT B 5; 
DUMP C;

Please provide your input data if it not works for you.

Sandeep Singh
  • 7,790
  • 4
  • 43
  • 68
  • This does not answer the question. Using `CSVLoader` instead of `PigStorage` has nothing to do with his error, in fact if his csv is separated by tabs what he is using to load the data is correct. – Balduz May 13 '15 at 09:54
  • Do we need to download the Piggybank.jar or it will be there in Pig bin path? – Ananth Francis May 13 '15 at 10:01
  • Then, What could be the reason for my issue @Balduz? – Ananth Francis May 13 '15 at 10:02
  • According to log message, It seems there is no issue with his code. I have just suggested the ideal way to deal with CSV Files. piggybank jar should be available. you do not need download. – Sandeep Singh May 13 '15 at 10:03
  • @AnanthFrancis unless you add your input data, we cannot help... Your code seems ok. – Balduz May 13 '15 at 10:07
  • I am getting below error while running the Register Piggybank.jar. grunt> REGISTER piggybank.jar ERROR [main] org.apache.pig.tools.grunt.Grunt - ERROR 101: file 'piggybank.jar' does not exist. Details at logfile: /opt/ibm/biginsights/bin/pig_1431510561177.log – Ananth Francis May 13 '15 at 10:08
  • Can you try to locate it.it should be available in **/usr/lib/pig/** directory. you should run command : **locate piggybank.jar** – Sandeep Singh May 13 '15 at 10:17
  • I just noticed the issue. It seems the csv file(input file) is not separated by 'comma'. – Ananth Francis May 13 '15 at 11:12
0

You have not mentioned the whole address of healthcare_Sample_dataset1.csv that's why dump is not working properly. Load data by Writing full path of that file than Dump will work!!

0

I think you need to load all fields as bytearray, then remove first row (i.e. header), because they don't match the data types you want to impose on those fields. OR remove first row using text editor and use your own code.

Atilla
  • 1