Specify the provenance of FHIR Resources generated by applying NLP over medical narratives

Question

is a standard for health care data exchange, published by HL7®.

Provides metadata about the document so that the document can be discovered and managed

Through the Provenance one can

describe entities and processes involved in producing and delivering or otherwise influencing that resource

Nearly 80 percent of clinical information in electronic health records(EHRs) is "unstructured" and in a format that health information technology systems cannot use.

It is therefore natural to apply computer techniques to automatically generate structured data from the medical records. For that there are several implementations available both on the market and also fully open source. For example cTAKES, CLAMP, NOBLE, ClarityNLP and others are all freely available solutions targeting this task.

They all address the specific need of generating structured data from unstructured medical notes, however they all deliver the structure using their own format, that eventually could be converted into FHIR.

However, a central problem is on how to represent the Provenance of the extracted information, since FHIR is - to the best of my knowledge - missing the way of connecting to the precise location within the DocumentReference object of where the information has been extracted from , with which technology, and which is the level of "quality" of the extracted information.

Before submitting a Change Request https://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemBrowse to the FHIR normative, it is recommended to expose the issue to the widest community and the stackoverflow.com is one of the main recommended channels.

For this purpose I am hereby looking forward opinions on the matter, and namely on how to specify the provenance of FHIR Resources generated by applying NLP over medical narratives. For example, taking an example from the Adverse Event Corpus of Gurulingappa et al https://doi.org/10.1016/j.jbi.2012.04.008 ,

10030778|Intravenous azithromycin-induced ototoxicity.|ototoxicity|43|54|azithromycin|22|34
123456789012345678901234567890123456789012345678901234567890
         1         2         3         4         5

The question is how to represent into FHIR that such drug induced problem has been extracted from the specific bytes positions 22-34 (drug) and 43-54 (problem) from the text (the Title of the paper 1999 in this example).

{
  "resourceType": "AdverseEvent",
  "id": "example",
  "actuality": "actual",
  "category": [
    {
      "coding": [
        {
          "system": "http://terminology.hl7.org/CodeSystem/adverse-event-category",
          "code": "product-use-error",
          "display": "Product Use Error"
        }
      ]
    }
  ],
  "event": {
    "coding": [
      {
        "system": "http://snomed.info/sct",
        "code": "9062008",
        "display": "Ototoxicity (disorder)"
      }
    ],
    "text": "10030778|Intravenous azithromycin-induced ototoxicity."
  },
  "subject": {
    "reference": "Patient/example"
  },
  "date": "1999-02-29T00:00:00+00:00",
  "seriousness": {
    "coding": [
      {
        "system": "http://terminology.hl7.org/CodeSystem/adverse-event-seriousness",
        "code": "Non-serious",
        "display": "Non-serious"
      }
    ]
  },
  "severity": {
    "coding": [
      {
        "system": "http://terminology.hl7.org/CodeSystem/adverse-event-severity",
        "code": "mild",
        "display": "Mild"
      }
    ]
  },
  "recorder": {
    "reference": "Pharmacotherapy. 1999 Feb;19(2):245-8."
  },
  "suspectEntity": [
    {
      "instance": {
        "reference": "Azithromycin"
      }
    }
  ]
}

Currently the FHIR standard does not allow to represent the precise byte position, the quality of the extraction, and the method used to perform it.

Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) and ["What types of questions should I avoid asking?"](http://stackoverflow.com/help/dont-ask). I encourage you to compress the technical aspects of what you are asking into a [Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve). Generally, this website is not designed for open-ended research questions, but rather, for very **specific** programming questions. — darksky, Feb 20 '19 at 14:23

score -1 · Answer 1 · answered Feb 20 '19 at 14:58

Excellent discussion. The FHIR Provenance so far today is focused on the most likely need of Provenance. An important Principle of FHIR is that we focus on the most needed functionality before we focus on things that are not likely to be implemented or used. This is not to say those things not yet supported are not likely to be implemented or used, but rather simply pointing out the Prioritization of what gets addressed.

So, looking at your use-case, I first ask if it is realistic that a consumer of some FHIR Resource (e.g. Observation) will care deeper about Provenance than a gross statement that the Observation was extracted from a specific Document. That is to ask if it is important to have recorded the Provenance details any deeper than is available today? This is not to say that academically it is interesting, it surely is logical. But how useful is it, especially since it will be very expensive to record this level of detail.

Lets assume there is some reasonable, although small, need for this. The FHIR specification enables anyone to define extensions anywhere. So you could define an extension to the Provenance resource that supports your use-case. Likely some extension on the Provenance.entity. If there is some reasonable cohort that need this extension, it could be defined in a publicly accessible Implementation Guide using StructureDefinition, and be registered on the fhir.org as such. If this seems even more useful, these extensions could be added to the FHIR specification within the Provenance resource.

However I am not convinced that this is a widely needed functionality beyond those extension based mechanisms. Especially since this need is only needed with DocumentReference based Binary that are not structured.

The good news is that Provenance is simply at FMM of 3, and we do expect to try to get it normative in the next R5 release. So now is the time to have these excellent discussions.

Dear John, thankyou for your fast and detailed reply. yes, the use case is realistic and already supported (using proprietary non-FHIR compliant solutions) by hospital information systems / electronic health record software (don't want to name solutions in order to avoid advertisements). The need is from the physicians themselves that they want to be able to "drill down" to the real evidence in fast way. Reading the full report would not save them time. I agree in the use of Extensions, however we need to have them widely accepted and not "proprietary" by any mean. — Luca Toldo, Feb 20 '19 at 15:18
As a side-note, Stack Overflow isn't really intended for discussion, so now that the question's been answered, discussion on the need and potentially the appropriate design of extensions is best handled on http://chat.fhir.org — Lloyd McKenzie, Feb 20 '19 at 15:35
So, what kind of extensions are you thinking would be needed? Is an extension to point to the starting byte within the Binary pointed to by DocumentReferenece sufficient? Or is there a need to have starting byte and ending byte? Is byte count the right approach, when some non standard forms would have a different offset than would appear to be the right offset to a human (because of file format header or encoding (xml))? Can the "method used" simply be identified in the Provenance.policy? This is what IHE uses with document extraction. What kind of quality declaration be said? — John Moehrke, Feb 20 '19 at 16:32

Specify the provenance of FHIR Resources generated by applying NLP over medical narratives

1 Answers1