Dump is not working

Question

I am using IBM BigInsights. When I execute the DUMP command in Pig Grunt shell, I am not getting any result.

Sample Input file:

s_no,name,DOB,mobile_no,email_id,country_code,sex,disease,age
11111,bbb1,12-10-1950,1234567890,bbb1@xxx.com,1111111111,M,Diabetes,78
11112,bbb2,12-10-1984,1234567890,bbb2@xxx.com,1111111111,F,PCOS,67
11113,bbb3,712/11/1940,1234567890,bbb3@xxx.com,1111111111,M,Fever,90
11114,bbb4,12-12-1950,1234567890,bbb4@xxx.com,1111111111,F,Cold,88
11115,bbb5,12/13/1960,1234567890,bbb5@xxx.com,1111111111,M,Blood Pressure,76

INFO  [JobControl] org.apache.hadoop.mapreduce.lib.input.FileInputFormat     - Total input paths to process : 1

My code is as follow:

    A = LOAD 'healthcare_Sample_dataset1.csv' as(s_no:long,name:chararray,DOB:datetime,mobile_no:long,email_id:chararray,country_code:long,sex:chararray,disease:chararray,age:int);
B = FOREACH A GENERATE name;
C = LIMIT B 5;
DUMP C;

Kindly help me to resolve this.

Thanks and Regards!!!

I have edited your question. Please remove code from comment and provide some input data. — Sandeep Singh, May 13 '15 at 09:34
ok Sure S.Singh. As you have asked, I have pasted the content of my script in the comments. Could you please let me know, If you have any idea of the issue — Ananth Francis, May 13 '15 at 09:47
What @s.singh means is that you should put the code in the question (edit it and add it there) instead of writing it in a comment for clarity. And as he said, you should put a sample of your input data too. — Balduz, May 13 '15 at 09:55
@Balduz I have just modified my code. Could you please help me how can I provide you the input file? If possible, Could you please let me know your email id? — Ananth Francis, May 14 '15 at 06:42
@Balduz, S.Singh, I have updated my question with sample input file, Kindly help me to resolve this issue — Ananth Francis, May 14 '15 at 06:45

score 0 · Answer 1 · answered May 13 '15 at 09:47

0

From your script I can see that you are using CSV File. If you are working with CSV File then you should use CSVLoader() in your pig script. Your script should be like this:

--Register piggybank jar which contains UDF of CSVLoader
REGISTER piggybank.jar

-- Define the UDF
define CSVLoader org.apache.pig.piggybank.storage.CSVLoader();

--Load data using CSVLoader

A = load '/user/biadmin/test/CBTTickets.csv' using CSVLoader AS (
                    Type:chararray,
                    Id:int,
                    Summary:chararray,
                    OwnedBy:chararray,
                    Status:chararray,
                    Prio‌rity:chararray,
                    Severity:chararray,
                    ModifiedDate:datetime,
                    PlannedFor:chararray,
                    Time‌Spent:int);


B = FOREACH A GENERATE Type; 
C = LIMIT B 5; 
DUMP C;

Please provide your input data if it not works for you.

answered May 13 '15 at 09:47

Sandeep Singh

7,790
4
43
68

This does not answer the question. Using `CSVLoader` instead of `PigStorage` has nothing to do with his error, in fact if his csv is separated by tabs what he is using to load the data is correct. – Balduz May 13 '15 at 09:54
Do we need to download the Piggybank.jar or it will be there in Pig bin path? – Ananth Francis May 13 '15 at 10:01
Then, What could be the reason for my issue @Balduz? – Ananth Francis May 13 '15 at 10:02
According to log message, It seems there is no issue with his code. I have just suggested the ideal way to deal with CSV Files. piggybank jar should be available. you do not need download. – Sandeep Singh May 13 '15 at 10:03
@AnanthFrancis unless you add your input data, we cannot help... Your code seems ok. – Balduz May 13 '15 at 10:07
I am getting below error while running the Register Piggybank.jar. grunt> REGISTER piggybank.jar ERROR [main] org.apache.pig.tools.grunt.Grunt - ERROR 101: file 'piggybank.jar' does not exist. Details at logfile: /opt/ibm/biginsights/bin/pig_1431510561177.log – Ananth Francis May 13 '15 at 10:08
Can you try to locate it.it should be available in **/usr/lib/pig/** directory. you should run command : **locate piggybank.jar** – Sandeep Singh May 13 '15 at 10:17
I just noticed the issue. It seems the csv file(input file) is not separated by 'comma'. – Ananth Francis May 13 '15 at 11:12

score 0 · Answer 2 · answered Aug 10 '17 at 20:14

0

You have not mentioned the whole address of healthcare_Sample_dataset1.csv that's why dump is not working properly. Load data by Writing full path of that file than Dump will work!!

answered Aug 10 '17 at 20:14

Sourabh Gupta

1

score 0 · Answer 3 · answered Feb 27 '18 at 12:40

0

I think you need to load all fields as bytearray, then remove first row (i.e. header), because they don't match the data types you want to impose on those fields. OR remove first row using text editor and use your own code.

answered Feb 27 '18 at 12:40

Atilla

1

Dump is not working

3 Answers3