-2

The reduce function below contains operation is wrong in Hadoop, could anyone tell me what the problem is and any solution to solve this problem?

pseudo-code is as below:

Algorithm:reduce(String key, Iterator values)

int numDocs = 0
for all v in values do
  numDocs += v;
end for

if numDocs < 2 then
  return none
end if

for all v in values do
  Emit(key,res)
end for
FlyingBurger
  • 1,182
  • 2
  • 16
  • 21
  • could anyone answer my question? – FlyingBurger Apr 02 '18 at 23:44
  • You've consumed the iterator by counting the values... The second loop never happens. What exactly is the problem? What is a "contains operation"? Would you like to show the actual code you're running? http://idownvotedbecau.se/itsnotworking/ – OneCricketeer Apr 03 '18 at 00:17
  • Why does second loop never happen??? – FlyingBurger Apr 03 '18 at 08:31
  • Because that's how iterators work. They are not lists. You cannot loop over them twice without storing the content (assuming we're talking about Java, not your pseudocode) – OneCricketeer Apr 03 '18 at 12:06
  • Possible duplicate of [Iterate twice on values](https://stackoverflow.com/questions/6111248/iterate-twice-on-values) – OneCricketeer Apr 03 '18 at 12:23

1 Answers1

1

If I correctly understand that you are trying to

  1. Count the length of the iterator
  2. Output nothing when you have fewer than two elements
  3. Otherwise write out all results

Know this - the contract of a Java Iterator states it can only be consumed once, and there is no "reset" function

You must store the results, for example

List docs = new ArrayList();
int numDocs = 0;
for (; values.hasNext(); numDocs++) {
    docs.add(values.next());
} 

if (numDocs < 2) {
    return;
} 

for (Object v : docs) {
  context.write(key,v);
}

Alternatively, you might be interested in this answer

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245