I was reading in the forum and this post caught my attention, since I had to perform the same operation. The accepted answer although uses the intrinsic data of the RGB Camera, which I do not understand. Why not use the intrinsic data of the depth camera for the projection to the image plane, since the point cloud is constructed with the depth camera?
(I started this question as a new one because I have not enough reputation to write it as a comment)