Three part related entities not specifically identified by a sentence

Question

How do I train a Watson Knowledge Studio machine learning annotator to identify education info that is not a part of a proper sentence. For example, two bullet points. How do I form a type system that will identify entities without breaking them all apart? I've considered using relation annotations, but according to the official documentation relation types should only be annotated if the sentence specifically mentions the relation. Such as "Mary works for IBM" is an example of the employedBy relation type. (Mary employedBy IBM) However, their own videos show them annotating "Ford F-150" with a manufacturedBy relation even though the sentence doesn't specifically state the relation. For example, "The Ford F-150 struck a light pole." (F-150 manufacturedBy Ford)

This is the kind of text I'm working with:

B.A., City University of New York, 1995
M.A., New York University, 1997
Ph.D, Columbia University, 1999

I could annotate these with degree, school, and graduationYear entities, but I'll end up getting back "1995", "1997", "1999" "B.A.", "City University of New York", "Columbia University", "M.A.", "New York University", "Ph.D"; a jumble that I can't work with because I can't tell anymore what degree belongs with what school belongs with what graduation year.

score 1 · Answer 1 · answered Nov 13 '17 at 03:01

As for the expressions which include two bullet points, there is a possibility to improve accuracy to detect sentences as they can work with WKS, using Dictionary-based Tokenizer. https://console.bluemix.net/docs/services/knowledge-studio/create-project.html#wks_tokenizer

I imported your example text to WKS and checked the result of tokenization, and then the expression was separated into 3 sentences. In this case you can annotate relations among degree, school and graduation year.

Three part related entities not specifically identified by a sentence

1 Answers1