I am trying to write a mapreduce job where I need to iterate the values twice.
So when a numerical csv
file is given we need to apply this for each column.
For that we need to find the min
and max
values and apply it in the equation
(v1).
What I did so far is
In map()
I emit the column id as key and each column as values
In Reduce()
I calculated the min and max values of each column.
After that I am stuck.
Next my aim is to apply
the equation
(v = [(v − minA)/(maxA − minA)]*(new maxA − new minA ) + new minA )
My new maxA and new minA is 0.1,0.0
respectively and I also have each columns max and min.
Inorder to apply the eqn v1 I need to get v,ie the input file.
How to get that?
What I thought was-
From input csv file take the first row (iris dataset)
[5.3,3.6,1.6,0.3]
apply eqn for each attribute and emit the entire row(Min and Max value is known in Reducer itself). But in reducer I will only get the column values.Or else I should read my inputfile as an argument in setup() of reducer().
Is that a best practise. Any suggessions.
UPDATE
As Mark Vickery
suggested I did the following.
public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException,
InterruptedException {
System.out.println("in reducer");
double min = Integer.MAX_VALUE,max = 0;
Iterator<DoubleWritable> iterator = values.iterator();
ListIterator<DoubleWritable> lit = IteratorUtils.toListIterator(iterator);
System.out.println("Using ListIterator 1st pass");
while(lit.hasNext()){
System.out.println(lit.next());
DoubleWritable value = lit.next();
if (value.get()< min) {
min = value.get();
}
if (value.get() > max) {
max = value.get();
}
}
System.out.println(min);
System.out.println(max);
// move the list iterator back to start
while(lit.hasPrevious()){
lit.previous();
}
System.out.println("Using ListIterator 2nd pass");
double x = 0;
while(lit.hasNext()){
System.out.println(lit.next());
}
In 1 st pass I am able to get all the values correctly.But for 2 nd pass I am only getting the each element repeatedly.