5

I am trying to understand what is the relevance score that opencalais returns associated with each entity? What does it signify and how is it to be interpreted? I would be thankful for insights into this.

Ninja
  • 5,082
  • 6
  • 37
  • 59

1 Answers1

5

Their documentation states: The relevance capability detects the importance of each unique entity and assigns a relevance score in the range 0-1 (1 being the most relevant and important).

While they do not explain what 'relevance' means exactly, one would expect it to quantify the centrality of the entity to the discourse of the document. It's likely influenced by factors such as the entities mention frequency in this document as compared to its expected frequency in a random document (cf. TF-IDF), but could also involve more sophisticated discourse analysis.

John Lehmann
  • 7,975
  • 4
  • 58
  • 71
  • Thanks John. Do you think it is a probabilistic score? As they are expressing it as a percentage, I was wondering if it could be a probabilistic score but I have a strong feeling it is not. What is your take on this? My application maps the text documents to important entities in them. What range of the score do you think would indicate that it is important enough to be mapped to the document? I would be thankful for your response. – Ninja Jan 09 '11 at 10:47
  • Hi Ninja. I can only speculate, but I would guess that it is not a true probability. My best advice would be to run some experiments and see what looks good to you. I bet you could pick out a reasonable threshold within 10 or 15 minutes of data analysis. You'll be forced to make the decision on which is more important: missing important entities, or including questionable ones. But it's fairly subjective. – John Lehmann Jan 09 '11 at 18:50