3

I have just started using Pig to analyze a bunch of log files using Hadoop, and I need to load different files depending on the output of a previous calculation. For example, if the output of the calculation is 0x18e0, I need to load a file called 0x18e0.txt. How do I give parameterized file names in the LOAD statement?

In python, it's really straightforward to do this:

x = str(var)     
File = open( x + '.txt', 'r')

Is there a similarly simple way to do this in Pig? I cannot give the input in command line like

pig -param input=x.txt

because I don't know the value of x before I run the script.

I see another option of specifying an input file itself as the parameter as described here https://wiki.apache.org/pig/ParameterSubstitution but this seems unduly roundabout. Is there another solution for this?

Ahmis
  • 43
  • 6
  • what kind of values can x take? – Dheeraj R Jul 27 '14 at 20:43
  • I've given an example above. x is just a string. In my case, x is a hexadecimal number that is represented as a string concatenated with a .txt, to make it a text file. – Ahmis Jul 28 '14 at 02:09
  • 1
    would `%declare` answer your need ? – merours Jul 28 '14 at 15:07
  • %declare is a preprocessing step. So I need to declare the variable via %declare at the top of the script. Can I do some computation and assign the output of that computation to a variable via %declare? I thought I couldn't do that. Please correct me if I'm wrong. – Ahmis Jul 29 '14 at 16:12

1 Answers1

0

You can do this by: (a) doing the pre-processing in the command line, or (b) using declare and calling a bash script:

Approach (a): In this example whatever goes between the backticks (`) is the preprocessing that results in the hexadecimal number that you want to use as the name of the file:

pig -param input=`hdfs dfs -cat file_list.txt | awk 'BEGIN{ORS="";}{if (NR == 1) print; else print ","$0;}'`.txt script.pig

Approach (b). Create a bash script that does the processing you need to get x:

#!/bin/bash

#HERE YOU PUT CODE THAT PRINTS OUT THE HEX NUMBER

Then a pig script as follows:

%declare x `./my_script.sh`

...

In approach (b), you don't really need to create a Bash script, since you could do the pre-processing using command line tools and backticks (as shown).

Similar approaches have been suggested in other StackOverflow answers. More details here and here.

Community
  • 1
  • 1
cabad
  • 4,555
  • 1
  • 20
  • 33