Supporting paging of a FHIR binary document

Question

A patient can have a list of binary documents attached to them (not FHIR structured documents). Some are very large binary documents with 100+ pages such as PDFs or multi-page TIFFs.

Is there a standard way to page a binary document, in terms of:

Total pages in document
Get binary for page N

I see paging is specified for the /fhir/search resource but not on a document specifically. This might seem beyond the scope of FHIR, but if the document is 100Mb+ you don't want to have to download the whole file to read the first couple of pages.

I appreciate some documents can't reliably support paging such as text/html documents.

...or is there a way to specify a list of pages as linked resources (/fhir/patient/11/document/22/?page=1) within a document?

score 1 · Accepted Answer · answered Jan 09 '14 at 06:10

1

FHIR treats binary resources as blobs. From a behavior perspective, it treats PDF documents, images, videos, text files and everything else identically. So no paging within a binary. Essentially you get the same behavior retrieving a binary from a FHIR repository as you would retrieving it from an XDS repository. Basic metadata (via DocumentReference) or the whole document.

That doesn't mean you couldn't define a custom (or even standard) Query on Binary that provided more smarts, it's just not part of the current FHIR standard. For it to make sense as part of the base standard, we'd need to see evidence of fairly broad support for this type of capability in existing systems (and ideally software libraries that make exposing "pages" of PDF and other types of documents - and probably segments of video and audio clips while we're at it)

Some alternatives to consider:

register the really large documents or videos using separate binaries for each chapter or segment to reduce retrieval size and allow "smarter" retrieval
define an extension that provides a "thumbnail" that can be included with the DocumentReference to give a better sense of content prior to retrieval (e.g. a document abstract, a lower-bandwidth image, etc.) A standard extension will be provided to support this in the next 6 months or so as we define extensions for all ISO 21090 data type properties that didn't make it into core.

answered Jan 09 '14 at 06:10

Lloyd McKenzie

6,345
1
13
10

Thanks for the useful response. Our current system allows a user to view a document, with a list of thumbnails and selecting a thumbnail loads that particular page, to help limit overall bandwith. This is all done in the browser, calling our REST APIs directly using javascript. We still give the option to download the full original binary. – peter.swallow Jan 09 '14 at 10:20
Can we append any list of parameters to the Query? How do you mean a Custom (or even Standard) Query? – peter.swallow Jan 09 '14 at 11:07
"Query" might work for retrieving a particular page/chapter/section. But the document would need to expose some metadata, a list of pages/chapters/sections with a resource end point and display name ideally. – peter.swallow Jan 09 '14 at 12:09
1

So it sounds like you'd want a complex extension on DocumentReference that supported exposing your thumbnail as well as a URL for the expanded version of the thumbnail. From there, you could simply query the Binary resource with either the URL for the full document or the URL for the specific pages. Your server would need to implement the magic of creating or exposing the thumbnails and page binaries when a new DocumentReference was created. That approach would minimize the amount of customization involved. (And would be ignorable by anyone who didn't understand it.) – Lloyd McKenzie Jan 09 '14 at 14:07
Query essentially allows you to define complex service calls. So if you wanted to support paging through a document with the ability to do "next page", "previous page", "jump to page #", etc., you could define a Query that would return a Binary of the desired page and would take parameters for things like "pageOperation" (code of "next|prev|goto" or something like that) and "targetPage". – Lloyd McKenzie Jan 09 '14 at 14:19
1

You'd then define a Profile that identified the constraints on the Query (what parameters were allowed, what responses were allowed). A "custom" query would be one supported only by your system. A "standard" query would be one where the query profile was published by HL7 International or perhaps by an affiliate or IHE or some other SDO. (For now, you'll need to go the "custom" route, but getting it standardized is an option for down-the-road if you think the pattern is going to be needed by others.) – Lloyd McKenzie Jan 09 '14 at 14:22
Sounds like you should use WEBDAV with the FHIR syntax to me – Grahame Grieve Jan 12 '14 at 11:36
Ok. I'll look at an extension on DocumentReference. Where could I return the number of pages/sections within the DocumentReference. Are there custom properties which can be returned within DocumentReference? This can be DB/IO heavy if the binary is analysed on-the-fly, so may need to be a separate query. – peter.swallow Jan 13 '14 at 16:30
I'm not that familiar with WEBDAV but we want to expose all documents (text, html, RTF, Word, PDF, TIFF, etc) in a common/easy format, such as text or JPGs. The client should not need any specific software to be installed to view them, especially if the intended audience could eventually be any hospital in the country/world. We intend to convert the documents based on the client sending a media-type as well, converting and returning the document's page/section in the specified media type (assuming that API supports it). – peter.swallow Jan 13 '14 at 16:35
Is a DocumentManifest appropriate with a subset of pages each as separate DocumentResources? – peter.swallow Jan 13 '14 at 17:06

Supporting paging of a FHIR binary document

1 Answers1