How can I keep the association between image and coordinate (z-axis) after a preprocess function?

Question

I'm working on a preprocessing function that takes DICOM files a input and returns a 3D np.array (image stack). The problem is that I need to keep the association between ImagePositionPatient[2] and the relative position of the processed images in the output array.

For example, if a slice with ImagePositionPatient[2] == 5 is mapped to a processed slice in position 3 in the returned stack, I need to return another array that has 5 in the third position, and the same for all original slices. For slices created during processing by interpolation or padding, the array shall contain a palceholder value like -99999 instead.

I paste my code here. EDIT: new simplified version

def lung_segmentation(patient_dir):
    """
    Load the dicom files of a patient, build a 3D image of the scan, normalize it to (1mm x 1mm x 1mm) and segment
    the lungs
    :param patient_dir: directory of dcm files
    :return: a numpy array of size (384, 288, 384)
    """

    """ LOAD THE IMAGE """

    # Initialize image and get dcm files
    dcm_list = glob(patient_dir + '/*.dcm')
    img = np.zeros((len(dcm_list), 512, 512), dtype='float32') # inizializza un 
    # vettore di len(..) di matrici di 0 e di ampiezza 512x512
    z = []

    # For each dcm file, get the corresponding slice, normalize HU values, and store the Z position of the slice
    for i, f in enumerate(dcm_list):
        dcm = dicom.read_file(f)
        img[i] = float(dcm.RescaleSlope) * dcm.pixel_array.astype('float32') + float(dcm.RescaleIntercept)
        z.append(dcm.ImagePositionPatient[-1])

    # Get spacing and reorder slices
    spacing = list(map(float, dcm.PixelSpacing)) + [np.median(np.diff(np.sort(z)))]
    print("LO SPACING e: "+str(spacing))
    # spacing = list(map(lambda dcm, z: dcm.PixelSpacing + [np.median(np.diff(np.sort(z)))]))
    img = img[np.argsort(z)]

    """ NORMALIZE HU AND RESOLUTION """

    # Clip and normalize
    img = np.clip(img, -1024, 4000) # clippa con minimo a 1024 e max a 4k
    img = (img + 1024.) / (4000 + 1024.)

    # Rescale 1mm x 1mm x 1mm
    new_shape = map(lambda x, y: int(x * y), img.shape, spacing[::-1])
    old_shape = img.shape
    img = resize(img, new_shape, preserve_range=True)

    print('nuova shape calcolata'+ str(img.shape)+' con calcolo eseguito su img_shape: '+str(old_shape)+' * '+str(spacing[::-1]))

    lungmask = np.zeros(img.shape) # WE NEED LUNGMASK FOR CODE BELOW

    lungmask[int(img.shape[0]/2 - img.shape[0]/4) : int(img.shape[0]/2 + img.shape[0]/4),
             int(img.shape[1]/2 - img.shape[1]/4) : int(img.shape[1]/2 + img.shape[1]/4),
             int(img.shape[2]/2 - img.shape[2]/4) : int(img.shape[2]/2 + img.shape[2]/4)] = 1
             # I set to value = 1 some pixel for executing code below, free to change

    """ CENTER AND PAD TO GET SHAPE (384, 288, 384) """

    # Center the image

    sum_x = np.sum(lungmask, axis=(0, 1))
    sum_y = np.sum(lungmask, axis=(0, 2))
    sum_z = np.sum(lungmask, axis=(1, 2))

    mx = np.nonzero(sum_x)[0][0]
    Mx = len(sum_x) - np.nonzero(sum_x[::-1])[0][0]
    my = np.nonzero(sum_y)[0][0]
    My = len(sum_y) - np.nonzero(sum_y[::-1])[0][0]
    mz = np.nonzero(sum_z)[0][0]
    Mz = len(sum_z) - np.nonzero(sum_z[::-1])[0][0]

    img = img * lungmask
    img = img[mz:Mz, my:My, mx:Mx]

    # Pad the image to (384, 288, 384)
    nz, nr, nc = img.shape

    pad1 = int((384 - nz) / 2)
    pad2 = 384 - nz - pad1

    pad3 = int((288 - nr) / 2)
    pad4 = 288 - nr - pad3

    pad5 = int((384 - nc) / 2)
    pad6 = 384 - nc - pad5

    # Crop images too big
    if pad1 < 0:
        img = img[:, -pad1:384 - pad2]
        pad1 = pad2 = 0
        if img.shape[0] == 383:
            pad1 = 1

    if pad3 < 0:
        img = img[:, :, -pad3:288 - pad4]
        pad3 = pad4 = 0
        if img.shape[1] == 287:
            pad3 = 1

    if pad5 < 0:
        img = img[:, :, -pad5:384 - pad6]
        pad5 = pad6 = 0
        if img.shape[2] == 383:
            pad5 = 1

    # Pad
    img = np.pad(img, pad_width=((pad1 - 4, pad2 + 4), (pad3, pad4), (pad5, pad6)), mode='constant')
    # The -4 / +4 is here for "historical" reasons, but it can be removed

    return img

reference library for resize methods etc. is skimage

@MrBeanBremen thanks for the reply. As you can see I do a resize operation that interpolate new images in the images-stack, so the sort operation that I done at that start of the function is annuled from this operation (and eventually from the pad one) since new images are introduced in the stack; In the end I need to keep track of the original images returned in the stack, in order to recognize them from ones introduced by interpolation and padding — Agostino Dorano, Jun 06 '20 at 15:23
@MrBeanBremen I just edited the question with new code :). For resize question, the skimage doc say this: "Performs interpolation to up-size or down-size N-dimensional images". Thank you a lot again for the reply! :) — Agostino Dorano, Jun 06 '20 at 16:33
@MrBeanBremen Interpolation will be done in z direction if the slices spacing is !=1.. slice distance can be various.. usually in my case from 1 mm to 5 mm; the sorting code works anyway because notwithstanding slices don't have 1mm spacing , values are anyway sortable. Only if slice spacing == 1 mm no interpolation will be done in z direction — Agostino Dorano, Jun 06 '20 at 17:21
@MrBeanBremen slice distance can be also 2.5 mm or other decimal numbers.. range is 1.00 to 5.00 mm, but no less then 1.00 — Agostino Dorano, Jun 06 '20 at 17:53
If you have a slice distance of 2.5mm, and resize your image to have 1mm distance, you will lose every second "original" image (you will get interpolated images at 2 and 3 mm distance instead). How do you define that original index in that case? Maybe I still didn't understand it correctly... — MrBean Bremen, Jun 06 '20 at 17:54
Welcome to SO; please see why [a wall of code isn't helpful](https://idownvotedbecau.se/toomuchcode/). — desertnaut, Jun 07 '20 at 01:32
@MrBeanBremen you're right, I'll change that part of code with the floor or ceil of the spacing; For the rest do you have some idea about resolve? — Agostino Dorano, Jun 07 '20 at 10:02
Ok, I have put some answer together, will remove my comments here to avoid the clutter. — MrBean Bremen, Jun 07 '20 at 16:07

MrBean Bremen · Accepted Answer · 2020-06-08T06:22:58.210

I will try to give at least some hints to the answer. As has been discussed in the comments, resizing may remove the processed data at the original positions due to needed interpolation - so in the end you have to come up with a solution for that, either by changing the resizing target to a multipe of the actual resolution, or by returning the interpolated positions instead.

The basic idea is to have your positions array z be transformed the same as the images are in z direction. So for each operation in processing that changes the z location of the processed image, a similar operation has to be done for z.

Let's say you have 5 slices with a slice distance of 3mm:

>>> z
[0, 6, 3, 12, 9]

We can make a numpy array from it for easier handling:

z_out = np.array(y)

This corresponds to the unprocessed img list.
Now you sort the image list, so you have to also sort z_out:

img = img[np.argsort(z)]
z_out = np.sort(z_out)

>>> z_out
[0, 3, 6, 9, 12]

Next, the image is resized, introducing interpolated slices.
I will assume here that the resizing is done so that the slice distance is a multiple of the target resolution during resizing. In this case you to calculate the number of interpolated slices, and fill the new position array with corresponding placeholder values:

slice_distance = int((max(z) - min(z)) / (len(z) - 1))
nr_interpolated = slice_distance - 1  # you may adapt this to your algorithm

index_diff = np.arange(len(z) - 1)  # used to adapt the insertion index 
for i in range(nr_interpolated):
    index = index_diff * (i + 1) + 1  # insertion index for placeholders
    z_out = np.insert(z_out, index, -99999) # insert placeholder for interpolated positions

This gives you the z array filled with the placeholder value where interpolated slices occur in the image array:

>>> z_out
[0, -99999, -999999, 3, -99999, -999999, 6, -99999, -999999, 9, -99999, -999999, 12]

Then you have to do the same padding as for the image in the z direction:

img = np.pad(img, pad_width=((pad1 - 4, pad2 + 4), (pad3, pad4), (pad5, pad6)), mode='constant')

# use 'minimum' so that the placeholder is used
z_out = np.pad(z_out, pad_width=(pad1 - 4, pad2 + 4), mode='minimum')

Assuming padding values 1 and 3 for simplicity this gives you:

>>> z_out
[-99999, 0, -99999, -999999, 3, -99999, -999999, 6, -99999, -999999, 9, -99999, -999999, 12, -99999, -999999, -99999]

If you have more transformations in z directions, you have to do the corresponding changes to z_out. If you are done, you can return your position list together with the image list:

return img, z_out

As an aside: your code will only work as intented if your image has a transverse (axial) orientation, otherwise you have to calculate the z position array from Image Position Patient and Image Orientation Patient, instead of just using the z component of the image position.

thank you, your comment was very helpful. Do you think there is a way to keep the associations if you resize to a value : `int(len (z) * z_spacing)` if z_spacing is a decimal? — Agostino Dorano, Jun 09 '20 at 16:09
As I wrote - this depends on how you want to handle it. For example, you may calculate and return the interpolated positions, and let the caller decide how to handle them. Or you can decide to mark the nearest integer value as "original" - there is a bit more calculation involved, but nothing too complicated. You could first calculate the interpolated positions, and then iterate over them and mark the ones that are not near an integer with a placeholder. — MrBean Bremen, Jun 09 '20 at 16:32
Rather than the nearest integer, wouldn't it be more appropriate to mark as original the slices with z-coordinate closest to the originals z? — Agostino Dorano, Jun 09 '20 at 16:45
don't worry and thank you very much, your contribution has been very useful :) — Agostino Dorano, Jun 09 '20 at 17:44
I have a question: how can I calculate the intepoleted positions in compliant manner of resizing of img? — Agostino Dorano, Jul 04 '20 at 10:13
That's just simple arithmetrics: take the overall thickness, devide it by the new number of slices (e.g. the z size), and create an ascending array with that slice distance as step (and maybe add the value of the first slice if it is not zero). — MrBean Bremen, Jul 04 '20 at 12:56
I'm using now this solution: "minim = z_out[0] ---------- z_out = np.array([(minim +i) for i in range(0,img.shape[0])])" where img.shape is z-dimension after resize and z_out is the sorted z like in the code above.. what do you think about this solution? — Agostino Dorano, Jul 04 '20 at 13:51

How can I keep the association between image and coordinate (z-axis) after a preprocess function?

1 Answers1