6

I'm trying to load simple file:

log = load 'file_1.gz' using TextLoader AS (line:chararray);
dump log

And I get an error:

2014-04-08 11:46:19,471 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input Pattern hdfs://hadoop1:8020/pko/file*gz matches 0 files
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
        at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)

Is is possible to manage such situation before error appears?

psmith
  • 1,769
  • 5
  • 35
  • 60
  • Pawel, Did you get to know how to handle this ? Even I have the same scenario.Thanks – Govind Oct 13 '14 at 16:34
  • Same here. I also i tried several regular expressions. none works as long as it returns "0 files" – math_law Sep 09 '15 at 12:15
  • You could create an empty `blank` file and load with a pattern like this `/pko/{blank,file*gz}`. It will load 0 rows when no `file*gz`s exist. – Morozov Mar 15 '18 at 10:35

3 Answers3

0

Input Pattern hdfs://hadoop1:8020/pko/file*gz matches 0 files

The error is the input file doesn't exist in the given hdfs path.

log = load 'file_1.gz' using TextLoader AS (line:chararray); as you haven’t mentioned the absolute path of file_1.gz , it will taken the home hdfs dir of the user with which you are running your pig-script

KrazyGautam
  • 2,839
  • 2
  • 21
  • 31
  • I know the reason of the error. But my question is : is it possible to manage these kind of errors in Pig. Something like try-catch. – psmith May 06 '15 at 10:09
0

Unfortunately in the current version of Pig (0.15.0) it is impossible to manage these errors without using UDF's.

I suggest creating a Java or Python script using try and catch to take care of this.

Here's a good website that might be of some use to you: https://wiki.apache.org/pig/PigErrorHandlingInScripts

Good luck learning Pig!

Bryan Linton
  • 456
  • 2
  • 10
0

I'm facing this issue as well. My load command is:

DATA = LOAD '${qurwf_folder_input}/data/*/' AS (...);

I want to load all files from the data subfolders, but the data folder is empty and I got the same error as you. What I did, in my particular case, was to create an empty folder in the data directory. So the LOAD returns an empty dataset and the script did not fail.

By the way, I'm using Oozie workflow to run the scripts, and in the prepare, I create the empty folders.

Sigrist
  • 1,471
  • 2
  • 14
  • 18