I have a graph computation that starts with a subset of vertices of a certain type and propagates information through the graph to a set of target vertices, which are also subset of the graph. I want to output only information from those particular vertices, but I don't see a way to do this in the various VertexOutputFormat subclasses, which all seem oriented to outputting something for every vertex in the graph. How do I do this? E.g., are there hooks for the output phase where I can filter output? Or am I supposed to write a VertexOutputFormat implementation that generates no output for the vertices that have no data? Thanks in advance.
Asked
Active
Viewed 72 times
1 Answers
2
You can simply extend the class and add an if-condition, that will do the trick.
For instance here a class which will print out only even vertex ids:
public class ExampleTextVertexOutputFormat extends
TextVertexOutputFormat<LongWritable, LongWritable, NullWritable> {
@Override
public TextVertexWriter createVertexWriter(
TaskAttemptContext context) throws IOException, InterruptedException {
return new ExampleTextVertexLineWriter();
}
/**
* Outputs for each line the vertex id and the searched vertices with their
* hop count
*/
private class ExampleTextVertexLineWriter extends TextVertexWriterToEachLine {
@Override
protected Text convertVertexToLine(
Vertex<LongWritable, LongWritable, NullWritable> vertex) throws IOException {
if (vertex.getId() % 2 == 0) {
return new Text(vertex.getId());
}
}
}
}

peter
- 14,348
- 9
- 62
- 96