I have recently started to learn more about supervised monocular depth estimation. I used the NYU-V2 dataset for it. it is easy to design a torch loader and pre-process the data since the structure of the dataset is quite clear. But in the case of Kitti dataset, it is very confusing. Is it possible to use Kitti for supervised monocular depth estimation? I found a torch loader for kitt here: https://github.com/joseph-zhang/KITTI-TorchLoader however, I don't understand how to use it for depth estimation using the Kitti dataset. the folder structure is quite different!. My plan is to train a simple CNN using a supervised mono depth approach.
-
Isn't it clear in the readme page of the repository? The dataset returns dictionaries containing `"left_img"`, `"right_img"`, and `"depth"`. – Ivan May 30 '22 at 08:50
-
of course. But the question is how we can use official depth prediction data for monocular depth estimation(using one photo) not left and right(binocular) ? this loader gives us left and right rgb images – PNF May 31 '22 at 09:44
-
Well, what's stopping you from only using a single camera view? – Ivan May 31 '22 at 09:50
-
is the depth also provided for left and right? As you mentioned this dataset returns a dictionary containing "left_img", "right_img", and "depth" . is this depth for left or for right? I mean I need one RGB image and corresponding depth map for that to train network – PNF May 31 '22 at 14:11
2 Answers
I think it is plausible since the KITTI dataset contains depth maps with the corresponding raw LiDaR scans and RGB images (left-image, right-image and depth map) (KITTI). I don't know how exactly the github repo works but the dataset/dataloader should be in a similar format. However, taking a look on the repo files, I think you need only to install the library and then pass as input the root_path of your dataset and the pytorch image transformations.
root_path
|-KITTIDepth
|-KITTIRaw

- 295
- 1
- 5
- 14
The repository states that the dense depth map are completions of the lidar ray maps and projected and aligned with the raw KITTI dataset.
Andreas Geiger et al., Vision meets Robotics: The KITTI Dataset
Looking at the dev toolkit for KITTI, the get_depth
function receives as an argument the camera id of the camera the Velodyne points are projected onto. This function is called here the dataloader with cam=self.cam
which is set as an attribute to the Kittiloader
instance.
In other words, you can choose on which camera the Velodyne points and depth completion is performed. By default cam
is set to 2
, which means cam_2
, the left camera view.

- 34,531
- 8
- 55
- 100
-
ok so in KittiDataset we define Kittloader object with for example cam = 2, so we would have images with the left camera. for the training loop, we have the left image and the depth. is this depth ground truth for that image? can we train the CNN in a supervised manner? (not with left and right?). Actually, I am talking about KITTIDepth modules, not KITTIRaw – PNF Jun 01 '22 at 18:55
-
Do you think this is the right way to use this loader : https://i.stack.imgur.com/cd6tQ.png – PNF Jun 01 '22 at 23:37
-
If the depth is aligned with the left camera, then the depth and left view RGB camera are aligned. So yes, you can supervise your model to perform single image depth regression. Your code looks good to me, depending on what the `'rgb'` key refers to, it would be `'left_img'` rather. – Ivan Jun 02 '22 at 08:05