2

Let G = (V,E) be a Directed Acyclic Graph (DAG). V is the set of vertexes, while E is the set of edges.

Now, suppose that G is corrupted by some annotators in a crowd, according to the crowdsourcing paradigm:

  • Some of them may decide to remove some edge e belonging to E
  • Some of them may decide to add an edge e which was not existing

The result of the work of an annotator i is a graph whose set of vertexes V is the same as the original one and whose set of edges Ei may differ from the original one. If n is the number of annotators, we come up with n different graphs, having the same set of vertexes V, but a different set of edges E. Let G1 = (V,E1), ..., Gn = (V,En) be the set of graphs.

I would like to know whether there is a way of merging these graphs, so as to find a consensus on the presence/absence of each possible edge e between two vertexes v1,v2 in V. The purpose of this operation is the one of fusing the opinion of each annotator about the construction of the set of edges E in the graph G. The final graph has to be a DAG.

Eleanore
  • 1,750
  • 3
  • 16
  • 33
  • Does the resulting graph have to be a DAG? If so, the problem (if you want to accept the maximum number of edge insertions) generalizes the NP-hard Feedback Arc Set problem, and will probably be hard to solve optimally. But this depends on what the actual objective is. – Falk Hüffner Apr 29 '13 at 12:07
  • @FalkHüffner yes, it has to be a DAG. I have heard about the solution you are proposing, however I am not able to apply it since I am not familiar with it. Nevertheless, I will update the question text. – Eleanore May 02 '13 at 15:06

2 Answers2

1

Let...

  • U be the distinct union of all Ei sets plus the original set E
  • T be some arbitrary threshold value
  • H(x) be some heuristic function
  • F be the final consensus set of edges

Pseudocode:

for each Edge e in U
   if H(e) >= T then F.Add(e)

The question is then of course how to define your heuristic function. A naive approach would be set based voting. Count the number of E sets containing the edge, and if enough people agree that it's in the graph, include it. This is a simple and efficient function to implement. Some weaknesses of this heuristic are its inability to detect and compensate for bad annotators or small crowd sizes.

Esoteric Screen Name
  • 6,082
  • 4
  • 29
  • 38
  • Thank you for your answer. However, by doing so some cycles can be introduced, while my objective is the one of finding a DAG. Maybe a constraint can be introduced, by doing in this way: when a cycle is recognized, the arc which had the smallest "vote" can be removed. – Eleanore May 02 '13 at 15:12
  • Ah, that's a good point. You'd have to add the constraint at the merge level. Disallowing annotating in cycles wouldn't be sufficient, since one could be produced by a set of independent annotations. – Esoteric Screen Name May 03 '13 at 13:13
1

For each edge count the number of graphs that contains it. If it is greater than some threshold, assume it was an original edge.

You may face some problems if some of the actions are biased. That is, each user does not randomly choose a particular edge to act upon.

ElKamina
  • 7,747
  • 28
  • 43