Pose estimation of 3D object and 2D picture, but it's X-ray volumetric data (DRR of CT/MRI)

Question

How can I localize 2D projections in (dense, transparent) 3D volumes? I seem to only find algorithms that work on datasets with non-transparent surfaces. Those algorithms rely on geometric assumptions that do not hold true in DVR environments (with semi-transparent objects).

I have CT/MRI scans (3D volumes) and X-rays (2D projections) of the same object and I need to match their coordinate systems. The goal is to find the position of features in the volume that are visible in the x-ray but not in the CT/MRI.

My idea was to synthesize images from the volume and use image matching algorithms to find the 'camera parameters' of the x-ray: I set up a renderer to synthesize 2D images from the volume (DRR - digitally reconstructed radiograph). I then added an ORB detector and matcher. Trying to brute-force it (iteratively move the camera position closer to the current best match (e.g. binary search or golden-section search) does not yield stable results.

I suspect this is partly due to the many degrees of freedom but also because the ORB detector is fed with a single-channel dataset and has trouble finding good features. I have a good starting point for the search but am not sure if the features remain 'stable enough' even for small variations of the camera position. I thought of using SLAM algorithms to map features of the volume and then use their relocalization feature but again, their geometric assumptions do not work in my environment. Using 'simpler' image matching algorithms is difficult as well, since the DRRs will never look exactly like the original and also because the x-ray is heavily obstucted (~50% usable).

I would greatly appreciate any ideas or references to related works. Thanks!

Edit: Source code is on github. Unfortunately I cannot share actual images, as they contain sensitive patient information. As an example, I can share a rendered image from a dummy head (actual scan from an artificial head with true to life densities):

The x-ray images will have ~50% obstructed by operation instruments. Currently my projections are rudimentary but produce upwards of 300fps. When adding orb detection and matching I still get around 120fps. I could make them better and trade off some performance but the entire process of matching two images to one volume must take place within few minutes as it will be used during operations. However my matcher doesn't even reliably find self generated images. Rotations work well but translation by a few cm already breaks it. Might orb detectors work better if I add filters to the image? For example sobel operator for border detection?

If you can post any example images it'd help us make specific recommendations. It looks like you've already put a fair bit of effort into this so generic suggestions probably won't be good enough. — Ian Chu, Aug 24 '23 at 13:46
I added an image from a volume that I use for testing. I can't share actual images unfortunately since they contain patient information. — lematthias, Aug 24 '23 at 15:07
this sounds like a 2d-3d pose recovery problem. in computer vision, that's a common task. even your xray projections are perspective projections, just with a long focal length ("camera", xray source, is several meters away). the computer vision problem commonly works with *surface* features. that's a limitation in the available "features" but such features are inherently 2d, and 2d-2d matching is a lot easier. this might also relate to matching of pictures ***to point clouds*** — Christoph Rackwitz, Aug 25 '23 at 08:18
without knowing the state of the art for this problem space... there might be more elegant approaches but your proposal should work. -- you can limit the degrees of freedom. you can guesstimate the object-camera distance (object to xray source). focal length is just a scale factor. two angles remain. — Christoph Rackwitz, Aug 25 '23 at 08:24
if you can share the volume data of your phantom too, that would be great. -- I'm not sure if I will have the time to experiment on this. I don't even have a solution for calculating DRR. if there's a *Python* library for working with your volumetric data, that would be convenient. — Christoph Rackwitz, Aug 25 '23 at 08:28
yes, the usual 2D feature descriptors (SIFT being the most well known one, also free to use since the patent expired) may not work well on xrays. features in xrays are usually just edges, rarely *corners*. a traditional assumption of feature points is that they're well located in all dimensions, i.e. pinned down points, not points that could slip along edges. perhaps the pose estimation could be reformulated to work with such slippy features. — Christoph Rackwitz, Aug 25 '23 at 08:37
to be sure: the whole goal of this is marker-less navigation, right? usually, for surgical navigation, xray-opaque balls are attached to the bone, then xrays are taken. those markers are balls so that an infrared camera can locate them, as well as marked surgical instruments. — Christoph Rackwitz, Aug 25 '23 at 08:41
perhaps throwing AI at the problem might work better. AI could learn xray-specific feature types, even 3D-2D corresponding features. AI could even be trained to directly infer the pose, given the volume and query projection. it would probably learn anatomy in 3D and what it looks like in projections. — Christoph Rackwitz, Aug 25 '23 at 08:42
Yes, the goal is to get rid of the markers entirely because they sometimes get in the way. They use small plates with multiple opaque dots on four sides of the head. Their positioning also limit the possible angles for x-rays, since two of these plates need to overlap in the x-ray. I uploaded the phantom as nifti to the github repo, there are many tools that can handle this file type. My dataset is too small to properly train an AI. Even if I were to train it with synthesized DRRs (which could yield questionable results) I would need many CTs. — lematthias, Aug 25 '23 at 10:18
I know distance emitter to detector and (roughly) emitter to patient from DICOM. I can probably just run an experiment to measure fov. That gives me a good starting point and removes one degree of freedom (scaling). Rotation is easily adjusted by hand up to a few degrees. But then I still need to search camera origin, direction and up vector (7 dims). If I can find the center (maybe by using the outline?) then I could have the camera always look at that point and remove 3 dimensions... — lematthias, Aug 25 '23 at 10:42
if this problem is not the *primary* problem (of a thesis or something), you could call up various companies specializing in surgical navigation. they might have solutions or could help with a prototype. — Christoph Rackwitz, Aug 25 '23 at 16:24

Pose estimation of 3D object and 2D picture, but it's X-ray volumetric data (DRR of CT/MRI)

0 Answers0