1

I have a graph computation that starts with a subset of vertices of a certain type and propagates information through the graph to a set of target vertices, which are also subset of the graph. I want to output only information from those particular vertices, but I don't see a way to do this in the various VertexOutputFormat subclasses, which all seem oriented to outputting something for every vertex in the graph. How do I do this? E.g., are there hooks for the output phase where I can filter output? Or am I supposed to write a VertexOutputFormat implementation that generates no output for the vertices that have no data? Thanks in advance.

Matthew Cornell
  • 4,114
  • 3
  • 27
  • 40

1 Answers1

2

You can simply extend the class and add an if-condition, that will do the trick.

For instance here a class which will print out only even vertex ids:

public class ExampleTextVertexOutputFormat extends
    TextVertexOutputFormat<LongWritable, LongWritable, NullWritable> {
  @Override
  public TextVertexWriter createVertexWriter(
          TaskAttemptContext context) throws IOException, InterruptedException {
    return new ExampleTextVertexLineWriter();
  }

  /**
   * Outputs for each line the vertex id and the searched vertices with their
   * hop count
   */
  private class ExampleTextVertexLineWriter extends TextVertexWriterToEachLine {
    @Override
    protected Text convertVertexToLine(
        Vertex<LongWritable, LongWritable, NullWritable> vertex) throws IOException {
      if (vertex.getId() % 2 == 0) {
        return new Text(vertex.getId());
      }
    }
  }
}
peter
  • 14,348
  • 9
  • 62
  • 96