1

Do any tools exist that allow Word users to label data using proprietary taxonomies for ingestion into a knowledge graph?

I have a group of subject matter experts that write very sensitive and domain-specific documents that I'd like to automatically ingest into a knowledge graph. They use Word to write, and are unlikely to move away from it without expending the kind of political capital I don't possess, especially as they work with external customers I can't influence who will use Word come hell or high water. Can I meet my experts where they are?

I have a rough idea of the kind of workflow I'd like to see.

  1. Expert writes document in Word
  2. Expert highlights some text and labels it as an object, is prompted to add an appropriate relationship if the ontology requires it
    • Bonus: we can deploy entity recognition and relationship detection to recommend labels and relationships for them to confirm
    • Double bonus: and the learning is active/online, based on their labelling
  3. Document is parsed for triples and added to the graph (so ideally labels are inserted into the XML, but the document must still be readable by vanilla Word)

There seem to be lots of tools that touch bits of this workflow, but I can't find anything that would truly meet my users where they are.

  • Word annotation Add-Ins focus on bibliographies and allowing teachers to add comments
  • WordPress seems to have lots of plugins that fill this kind of space, but my domain deals with extremely sensitive information, my documents are long, and the external customers my experts deal with will be using Word
  • Tools like Prodigy and doccano assume you're labelling something you don't have control over -- a finished document someone else made. I want my experts to label on the fly, which is far less work in the long run
  • Tools provided by graph database companies similarly assume you're powerless to actually label the document, and hard sell on automatic relationship detection and entity recognition

This space also suffers from a LOT of ambiguity in its key terms (graph, label, annotation, semantic) and lots of terms that orbit similar concepts from different domains (linked data, structured data, rich text, rich snippets, nanopublications etc), and it's made searching a real challenge.

Am I out of luck here? Do I need to get building an Add-In myself? Is this even technically possible using the Add-In framework?

illiter8
  • 21
  • 4
  • I'm just going to jump in here from a position of ignorance and suggest that you probably already know as well as anyone here (and better than me, for sure) what might be available as an addin of some kind. If you haven't found such a thing, yes, you're probably have to construct it yourself, but here's what would really put me off: it's very difficult to control what happens in Word because Word does not make it easy to respond on a keystroke-by-keystroke basis... – jonsson May 25 '23 at 20:47
  • ...and a corollary of that is that even if you can come up with a representation in Word ML for your taxonomy data it would be difficult to prevent your users from deleting, let's say, "substantial chunks of effort" by accident. All that said, the most obvious mechanisms for making a "named mark" in a Word document are inserting a named bookmark (every bookmark name must be unique within the document) or inserting a content control that you can either tag or title. That's with traditional VBA Word programmin. VSTO might allow you more control, but still not convinced it would be "enough" – jonsson May 25 '23 at 21:00

0 Answers0