0

I am trying to write a mapreduce job where I need to iterate the values twice.

So when a numerical csv file is given we need to apply this for each column.

For that we need to find the min and max values and apply it in the equation(v1).

What I did so far is

In map()
I emit the column id as key and each column as values
In Reduce()
I calculated the min and max values of each column.

After that I am stuck. Next my aim is to apply the equation

(v = [(v − minA)/(maxA − minA)]*(new maxA − new minA ) + new minA )

My new maxA and new minA is 0.1,0.0 respectively and I also have each columns max and min. Inorder to apply the eqn v1 I need to get v,ie the input file.

How to get that?

What I thought was-

From input csv file take the first row (iris dataset)

[5.3,3.6,1.6,0.3]

apply eqn for each attribute and emit the entire row(Min and Max value is known in Reducer itself). But in reducer I will only get the column values.Or else I should read my inputfile as an argument in setup() of reducer().

Is that a best practise. Any suggessions.

UPDATE

As Mark Vickery suggested I did the following.

public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException,
    InterruptedException {
System.out.println("in reducer");
double min = Integer.MAX_VALUE,max = 0;
Iterator<DoubleWritable> iterator = values.iterator();
ListIterator<DoubleWritable> lit = IteratorUtils.toListIterator(iterator);
System.out.println("Using ListIterator 1st pass");
while(lit.hasNext()){
    System.out.println(lit.next());
    DoubleWritable value = lit.next();
    if (value.get()< min) { 
        min = value.get();
    }
    if (value.get() > max) {
        max = value.get();
    }
}
System.out.println(min);
System.out.println(max);

// move the list iterator back to start
while(lit.hasPrevious()){
    lit.previous();
}

System.out.println("Using ListIterator 2nd pass");
double x = 0;
while(lit.hasNext()){
    System.out.println(lit.next());

}

In 1 st pass I am able to get all the values correctly.But for 2 nd pass I am only getting the each element repeatedly.

USB
  • 6,019
  • 15
  • 62
  • 93
  • Sorry I had to go to bed. I tested `ListIterator` again this morning and it worked fine. Can you just take out small code into an an independent running code and reproduce the problem on ideone.com Java editor? – anubhava Feb 28 '14 at 13:32
  • But it is not working for me anubhava :(.I tried onece more .I am getting the same output.Any other alternative way? – USB Mar 01 '14 at 03:03
  • @anubhava: Even if i am simply iterating with the code that you posted in http://stackoverflow.com/questions/6111248/iterate-twice-on-values also output the same duplicate values.It seems lit.previous() is only for onr time.ie if 10 elements are there and if lit.previous is done,the pointer goes to only 9th element not 1 st element. – USB Mar 01 '14 at 04:14

2 Answers2

1

You could enumerate over the reducer values twice in the same reduce. The first time to calculate the Min and Max and the second time to calculate your value and then emit it.

Rough example:

public void Reduce(string key, List<string> values, Context context)
{
    var minA = Min(values);
    var maxA = Min(values);

    foreach (var v in values)
    {
        var result = [(v − minA)/(maxA − minA)]*(new maxA − new minA ) + new minA;

        context.Emit(result);
    }
}
Mark Vickery
  • 1,927
  • 3
  • 22
  • 34
  • Thanks for the reply Mark Vickery.But I am not able to iterate through the values twice. public void reduce(Text key, Iterable values, Context context) – USB Feb 26 '14 at 03:05
  • 1
    if that solves your issue please have look :http://stackoverflow.com/questions/6111248/iterate-twice-on-values – Tom Sebastian Feb 26 '14 at 13:11
  • @Tom Sebastian:yes I looked into that But when i applyed that code.In 1 st pass I am able to get all the values correctly,There I am able to get the min and max values.But for 2 nd pass I need to iterate through each values.But i am only getting the first element (repeatedly).Not able to get all values in 2 nd pass. – USB Feb 27 '14 at 10:44
1

I found the answer. If we are trying to iterate twice in Reducer as below

    ListIterator<DoubleWritable> lit = IteratorUtils.toListIterator(it);
    System.out.println("Using ListIterator 1st pass");
    while(lit.hasNext())
        System.out.println(lit.next());

    // move the list iterator back to start
    while(lit.hasPrevious())
        lit.previous();

    System.out.println("Using ListIterator 2nd pass");
    while(lit.hasNext())
        System.out.println(lit.next());

We will only output as

Using ListIterator 1st pass
5.3
4.9
5.3
4.6
4.6
Using ListIterator 2nd pass
5.3
5.3
5.3
5.3
5.3

Inorder to get it in the right way we should loop like this:

ArrayList<DoubleWritable> cache = new ArrayList<DoubleWritable>();
 for (DoubleWritable aNum : values) {
    System.out.println("first iteration: " + aNum);
    DoubleWritable writable = new DoubleWritable();
    writable.set(aNum.get());
    cache.add(writable);
 }
 int size = cache.size();
 for (int i = 0; i < size; ++i) {
     System.out.println("second iteration: " + cache.get(i));
  }

Output

first iteration: 5.3
first iteration: 4.9
first iteration: 5.3
first iteration: 4.6
first iteration: 4.6
second iteration: 5.3
second iteration: 4.9
second iteration: 5.3
second iteration: 4.6
second iteration: 4.6
USB
  • 6,019
  • 15
  • 62
  • 93
  • @anubhava: Yes it is not using iterator and here we are making a copy of the same.When huge data is coming it may not be good storing the duplicate.But your answer as nice .There no copy is created just iterating through the elements. But for my bad luck it is not working.Did u found any bug in my previous code using iterator? – USB Mar 03 '14 at 05:43