I am novice in this field but here is my though on that.
You could use a conventional BFS algorithm following the below pseudo code.
At each iteration you launch an Hadoop job that discovers all the child nodes of the current working set that were not yet visited.
BFS (list curNodes, list visited, int depth){
if (depth <= 0){
return visited;
}
//run Hadoop job on the current working set curNodes restricted by visited
//the job will populate some result list with the list of child nodes of the current working set
//then,
visited.addAll(result);
curNodes.empty();
curNodes.addAll(result);
BFS(curNodes, visited, depth-1);
}
The mapper and reducer of this job will look as below.
In this example I just used static members to hold the working set, visited and result sets.
It should have been implemented using a temp file. Probably there are ways to optimize the persistence of the temporary data accumulated from one iteration to the next.
the input file I used for the job contains list of topples one topple per line e.g.
1,2
2,3
5,4
...
...
public static class VertexMapper extends
Mapper<Object, Text, IntWritable, IntWritable> {
private static Set<IntWritable> curVertex = null;
private static IntWritable curLevel = null;
private static Set<IntWritable> visited = null;
private IntWritable key = new IntWritable();
private IntWritable value = new IntWritable();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString(), ",");
if (itr.countTokens() == 2) {
String keyStr = itr.nextToken();
String valueStr = itr.nextToken();
try {
this.key.set(Integer.parseInt(keyStr));
this.value.set(Integer.parseInt(valueStr));
if (VertexMapper.curVertex.contains(this.key)
&& !VertexMapper.visited.contains(this.value)
&& !key.equals(value)) {
context.write(VertexMapper.curLevel, this.value);
}
} catch (NumberFormatException e) {
System.err.println("Found key,value <" + keyStr + "," + valueStr
+ "> which cannot be parsed as int");
}
} else {
System.err.println("Found malformed line: " + value.toString());
}
}
}
public static class UniqueReducer extends
Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
private static Set<IntWritable> result = new HashSet<IntWritable>();
public void reduce(IntWritable key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
for (IntWritable val : values) {
UniqueReducer.result.add(new IntWritable(val.get()));
}
// context.write(key, key);
}
}
Running a job will be something like that
UniqueReducer.result.clear();
VertexMapper.curLevel = new IntWritable(1);
VertexMapper.curVertex = new HashSet<IntWritable>(1);
VertexMapper.curVertex.add(new IntWritable(1));
VertexMapper.visited = new HashSet<IntWritable>(1);
VertexMapper.visited.add(new IntWritable(1));
Configuration conf = getConf();
Job job = new Job(conf, "BFS");
job.setJarByClass(BFSExample.class);
job.setMapperClass(VertexMapper.class);
job.setCombinerClass(UniqueReducer.class);
job.setReducerClass(UniqueReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
job.setOutputFormatClass(NullOutputFormat.class);
boolean result = job.waitForCompletion(true);
BFSExample bfs = new BFSExample();
ToolRunner.run(new Configuration(), bfs, args);