2

I am writing a distributed clustering algorithm using Apache Giraph. In the compute() method I need to access the value that each neighbors sent plus the weight of the edge between the current vertex and the neighbor who sent that message. However, the only message type that I see in the Giraph examples are single-type message (DoubleWritable, IntWritable, etc), which can only passes the value but not the sender information,

How can we access the sender information or the edge information as well?

For instance, in the above code we can get the value of each message, but we do not know which node sent this value to the current node.

public void compute(Iterator<DoubleWritable> msgIterator) {
    ...
    double minDist = isSource() ? 0d : Double.MAX_VALUE;
    while (msgIterator.hasNext()) {
        // Get who sent this message, how?
        minDist = Math.min(minDist, msgIterator.next().get());
    }
    ...
}

Thanks,

Thomas Jungblut
  • 20,854
  • 6
  • 68
  • 91

2 Answers2

4

I agree to Thomas Jungblut; writing your own Writable is probably the best (and easiest) solution.

I recently wrote a custom Writable called IntPairWritable which simply holds two Integers. Here's my code.

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.giraph.utils.IntPair;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Writable;

public class IntPairWritable extends IntPair implements Writable, Configurable {

    private Configuration conf;

    public IntPairWritable() {
        super(0, 0);
    }

    public IntPairWritable(int fst, int snd) {
        super(fst, snd);
    }

    @Override
    public void readFields(DataInput input) throws IOException {
        super.setFirst(input.readInt());
        super.setSecond(input.readInt());
    }

    @Override
    public void write(DataOutput output) throws IOException {
        output.writeInt(super.getFirst());
        output.writeInt(super.getSecond());
    }

    @Override
    public Configuration getConf() {
        return this.conf;
    }

    @Override
    public void setConf(Configuration conf) {
        this.conf = conf;
    }

    @Override
    public String toString() {
        return super.getFirst() + "," + super.getSecond();
    }
}

Your Writable class could look just similar. Maybe like

public class RetraceableWritable<I extends Writable, D extends Writable> implements Writable, Configurable {
    private I senderId;
    private D data;
    ...

...and so on.


  • Note 1: the default constructor must always exist to ensure that Hadoop can create an instance of your class.
  • Note 2: Giraph seems to like it when everything is configurable so implementing this interface is a good idea.

Regards

darefilz
  • 641
  • 6
  • 19
0

As darefilz mentioned, writing your own Writable class would be the best option. There is an example provided "verifyMessages.java" in giraph examples where a customized message class is used.

Here's the link https://apache.googlesource.com/giraph/+/old-move-to-tlp/src/main/java/org/apache/giraph/examples/VerifyMessage.java

nittoor
  • 113
  • 6