7

I want to know how I can find interesting relationships between users accounts such as the most connected, or most valuable users based on their connections to others.

Below I have the two tables I use. One has all the users, the other has the keys of the users they follow.

User
{
    id,
    name
}

Follows {
    user_id -> user.id,
    following_id -> user.id
}

What type of algorithms am I looking for?

Assuming unimportant people have little or no followers, how can I find the people in the center of the graph? I would assume they would be important because they have important people following them.

Update

As David and Steve point out, how close given nodes are, what nodes form sub communities, and which users are the most connected are all examples of useful data that can be pulled from this schema.

Since this "follower" design is used by many sites now, I've started a bounty in the hopes of getting some solid SQL or programming language implementations that might be useful to a wide variety of people.

It's worth noting that while the results of some algorithms are fascinating, others (such as finding related nodes) would have worth to the users of our sites as we can recommend things to them.

Xeoncross
  • 55,620
  • 80
  • 262
  • 364
  • There are a huge bunch of things you could find out here, can you give any more info as to whats the most important? For instance are you just looking to find the most connected users, sub communities, weak links (Users who link groups together) , similar users and so on. – Steve Jan 15 '12 at 11:05
  • At this point in time I am unsure what information or patterns can be derived from the available information. All of the ideas you listed @steve would make great examples for further research. – Xeoncross Jan 16 '12 at 16:42

1 Answers1

10

If you only concentrate on the links, try these popular centrality measures (assume G is the graph):

  1. Degree: Degree of node i is defined as ki/(N-1), where ki is the number of links to node i and N is the total number of nodes. Higher degree means important.
  2. Closeness: Closeness of node i is defined as (N-1)/(Σ_(j∈G) dij), where dij is the distance between node i and node j. This emphasizes on the distances of a node to all others nodes in the social network.
  3. Betweenness: Betweenness defined as (Σ_(j<k∈G) njk(i) / njk) / ((N-1)(N-2)), where njk denotes the number of shortest paths between nodes j and k, and njk(i) is the number of these paths running through node i. Betweenness of node i is higher means node i may be a good center that there are many connections between any other two nodes need to pass through node i.

Above measures can be easily calculated by only the link information, and you can use one or combine more of these centrality measures to find out the important node(s) in the social network. Anyway, according to the definition of "important", you may need other different measures.

Ddavid
  • 387
  • 2
  • 6
  • These concepts are great examples of things I was looking for. However, I'm afraid most of this math goes over my head when trying to convert to programming logic or SQL queries. – Xeoncross Jan 16 '12 at 16:44
  • Anyway, I don't have workable code about this question. Maybe you can try to ask the authors of this paper, [**WMR--A Graph-Based Algorithm for Friend Recommendation**](http://dl.acm.org/citation.cfm?id=1249071), or other papers about recommendation system of social networks for the code? I think this research is one of the most related to your question. – Ddavid Jan 17 '12 at 00:55