1

Let a large dicom file (for example https://drive.google.com/drive/folders/1ejY0CjfEwS6SGS2qe_uRX2JvlruMKvPX?usp=sharing) be given. I need to read, in numpy array format , the first frame of its pixel array as quickly as possible .

import pydicom

directory = #whatever directory to the file is stored
dicom = pydicom.dcmread(directory)

Now, as mentionned in some other posts, the following line completes the task:

first_image = dicom.pixel_array[0]

But my pixel array is of shape (1691, 555, 800, 3), which means dicom.pixel_array takes like 12 seconds to run. Since I have a lot of such dicom files to read first image, I need to come up with a way that is a lot faster .


My attempt:

I tried use its pixel data dicom[0x7fe0,0x0010]._value, which is in bytes. I wanted to extract the portion of bytes for the first image and then convert it to numpy. But I cannot decide which portion of the pixel data is responsible for the first image. The posts http://dicomiseasy.blogspot.com/2012/08/chapter-12-pixel-data.html and https://groups.google.com/g/dcm4che/c/ZQC2goCadiQ turns out not to be very helpful: the formula ROWS * COLUMNS * NUMBER_OF_FRAMES * SAMPLES_PER_PIXEL * (BITS_ALLOCATED/8) turns out to equal to 1332000 in my case, which does not even divide 122320858 , the pixeldata length .

Amit Joshi
  • 15,448
  • 21
  • 77
  • 141
温泽海
  • 216
  • 3
  • 16
  • 1
    Your data is compressed, so if you directly access `PixelData`, you get the compressed and encapsulated version, which will not match the real data in length. If you access `pixel_data` instead, the data is decompressed first, which accounts for the time needed. There is currently no possibility to access single compressed frames in pydicom, though there is [a PR](https://github.com/pydicom/pydicom/pull/1447) in the works intended to change this. – MrBean Bremen Jul 02 '22 at 17:37
  • Thanks for responding! How do you access `pixel_data`? `dicom.pixel_data` gives there is no such an attribute. – 温泽海 Jul 02 '22 at 17:47
  • Sorry, I meant `pixel_array`, that was a typo. This is what you have already done, but that will decompress the whole image, as I wrote. – MrBean Bremen Jul 02 '22 at 18:36

1 Answers1

1

As described in this GitHub issue currently there's no solution to this in native pydicom. You can use the highdicom package instead. You can use the ImageFileReader class in the highdicom.io submodule. For the sake of completeness, I report here the example proposed in the documentation to read each frame of a multi-frame dicom file one step at a time:

>>> from pydicom.data import get_testdata_file
>>> from highdicom.io import ImageFileReader

>>> test_filepath = get_testdata_file('eCT_Supplemental.dcm')
>>>
>>> with ImageFileReader(test_filepath) as image:
...     print(image.metadata.SOPInstanceUID)
...     for i in range(image.number_of_frames):
...         frame = image.read_frame(i)
...         print(frame.shape)
1.3.6.1.4.1.5962.1.1.10.3.1.1166562673.14401
(512, 512)
(512, 512)

Since your data don't contain an ICC profile you should run the example with the option correct_color=False in the read_frame function. Furtherly you should comment the first print since the absence of this attribute causes an attribute error in reading metadata. With these changes the example code above that should work on your data looks like this:

>>> with ImageFileReader(test_filepath) as image:
    ...     #print(image.metadata.SOPInstanceUID)
    ...     for i in range(image.number_of_frames):
    ...         frame = image.read_frame(i, correct_color=False)
    ...         print(frame.shape)

For further issues always take a look at the documentation first (here linked).

Aelius
  • 1,029
  • 11
  • 22
  • Thanks for the response! Can you open the file I gave in the post for me? I tried to open that one with your method and it does not seem to work. – 温泽海 Jul 18 '22 at 16:04
  • What type of error are you facing? Anyway, I think that your file is badly formatted since through the example I can see that there's only one frame. I couldn't get the example work completely but I think that the problem is in your file – Aelius Jul 18 '22 at 20:21
  • By copy pasting the code in my post, you should be able to read the file (at least I can). Calling `dicom.pixel_array.shape` you will see that my file has `1691` frames. The pixel data has `122320858` elements. When running your code, I get `No ICC Profile found in image metadata.` and `'Dataset' object has no attribute 'OpticalPathSequence'` and `'Dataset' object has no attribute 'ICCProfile'` as errors. – 温泽海 Jul 18 '22 at 20:29
  • I'm updating the answer. With the new edits, you should be able to run the example code. – Aelius Jul 18 '22 at 20:51
  • @温泽海 please accept the answer if it solves your problem now – Aelius Jul 19 '22 at 12:49
  • I know. It seems to be working. I will do some more testing and then accept your answer. – 温泽海 Jul 19 '22 at 18:19
  • This is exactly what I have been looking for. Thank you! – 温泽海 Jul 19 '22 at 19:12