0

Do you know how can I implement this algorithm using the MapReduce paradigm?

def getFriends(self, degree):
    friendList = []
    self._getFriends(degree, friendList)
    return friendList

def _getFriends(self, degree, friendList):
    friendList.append(self)
    if degree:
        for friend in self.friends:
            friend._getFriends(degree-1, friendList)

Let's say that we have the following bi-directional friendships:

(1,2), (1,3), (1,4), (4,5), (4,6), (5,7), (5,8)

How can, for example, to get the 1st, 2nd and 3rd degree connections of user 1? The answer must be 1 -> 2, 3, 4, 5, 7, 8

Thanks

pm3310
  • 119
  • 1
  • 2
  • 9

3 Answers3

0

Maybe you can use hive which support the sql-like query!

minicaptain
  • 1,196
  • 9
  • 16
0

As far as I understand, you want to collect all friends in the n-th circle of some person in a social graph. Most graph algorithms are recursive, and recursion is not well-suitable for a MapReduce way of solving tasks.

I can suggest you to use Apache Giraph to solve this problem (actually it uses MapReduce under the hood). It's mostly async and you write your jobs describing behaviour of a single node like:

1. Send a message from root node to all friends to get their friendlist.
2.1. Each friend sends a message with friendlist to root node.
2.2. Each friend sends a message to all it's sub-friends to get their friendlist.
3.1. Each sub-friend sends a message with friendlist to root node.
3.2. Each sub-friend sends a message to all it's sub-sub-friends to get their friendlist.
...
N. Root node collects all these messages and merges them in a single list.

Also you can use a cascade of map-reduce jobs to collect circles, but it's not very effective way to solve the task:

  1. Export root user friends to a file circle-001
  2. Use circle-001 as an input to a job that exports each user friends from circle-001 to a circle-002
  3. Do the same, but use circle-002 as an input
  4. ...
  5. Repeat N times

The first approach is more suitable if you have a lot of users to calculate their circles. The second has huge overhead of starting multiple MR jobs, but it's much simpler and is OK for small input set of users.

shutty
  • 3,298
  • 16
  • 27
  • Thank you for the response. I need to do it only on MapReduce :-( – pm3310 Jul 26 '13 at 13:01
  • Can you explain in more detail how can I cascade map-reduce jobs to collect circles, please? – pm3310 Jul 26 '13 at 13:24
  • Updated original answer to describe second approach. Note that Giraph is actually a MapReduce job under the hood. It's just a layer of abstraction to deal with large graphs on Hadoop. – shutty Jul 26 '13 at 15:11
  • Thanks a lot for your responses. Check my new updated initial question to make clearer what I need to achieve – pm3310 Jul 26 '13 at 15:31
0

I am novice in this field but here is my though on that.

You could use a conventional BFS algorithm following the below pseudo code.

At each iteration you launch an Hadoop job that discovers all the child nodes of the current working set that were not yet visited.

BFS (list curNodes, list visited, int depth){
    if (depth <= 0){
        return visited;
    }

    //run Hadoop job on the current working set curNodes restricted by visited

    //the job will populate some result list with the list of child nodes of the current working set

    //then,

    visited.addAll(result);
    curNodes.empty();
    curNodes.addAll(result);

    BFS(curNodes, visited, depth-1);
}

The mapper and reducer of this job will look as below.

In this example I just used static members to hold the working set, visited and result sets.

It should have been implemented using a temp file. Probably there are ways to optimize the persistence of the temporary data accumulated from one iteration to the next.

the input file I used for the job contains list of topples one topple per line e.g. 1,2 2,3 5,4 ... ...

  public static class VertexMapper extends
      Mapper<Object, Text, IntWritable, IntWritable> {

    private static Set<IntWritable> curVertex = null;
    private static IntWritable curLevel = null;
    private static Set<IntWritable> visited = null;

    private IntWritable key = new IntWritable();
    private IntWritable value = new IntWritable();

    public void map(Object key, Text value, Context context)
        throws IOException, InterruptedException {

      StringTokenizer itr = new StringTokenizer(value.toString(), ",");
      if (itr.countTokens() == 2) {
        String keyStr = itr.nextToken();
        String valueStr = itr.nextToken();
        try {
          this.key.set(Integer.parseInt(keyStr));
          this.value.set(Integer.parseInt(valueStr));

          if (VertexMapper.curVertex.contains(this.key)
              && !VertexMapper.visited.contains(this.value)
              && !key.equals(value)) {
            context.write(VertexMapper.curLevel, this.value);
          }
        } catch (NumberFormatException e) {
          System.err.println("Found key,value <" + keyStr + "," + valueStr
              + "> which cannot be parsed as int");
        }
      } else {
        System.err.println("Found malformed line: " + value.toString());
      }
    }
  }

  public static class UniqueReducer extends
      Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {

    private static Set<IntWritable> result = new HashSet<IntWritable>();

    public void reduce(IntWritable key, Iterable<IntWritable> values,
        Context context) throws IOException, InterruptedException {

      for (IntWritable val : values) {
        UniqueReducer.result.add(new IntWritable(val.get()));
      }
      // context.write(key, key);
    }
  }

Running a job will be something like that

UniqueReducer.result.clear();
VertexMapper.curLevel = new IntWritable(1);
VertexMapper.curVertex = new HashSet<IntWritable>(1);
VertexMapper.curVertex.add(new IntWritable(1));
VertexMapper.visited = new HashSet<IntWritable>(1);
VertexMapper.visited.add(new IntWritable(1));

Configuration conf = getConf();
Job job = new Job(conf, "BFS");
job.setJarByClass(BFSExample.class);
job.setMapperClass(VertexMapper.class);
job.setCombinerClass(UniqueReducer.class);
job.setReducerClass(UniqueReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
job.setOutputFormatClass(NullOutputFormat.class);
boolean result = job.waitForCompletion(true);

BFSExample bfs = new BFSExample();
ToolRunner.run(new Configuration(), bfs, args);