In the blog you linked to it says that you use the app with each set of image (or at least one image), one set is from the IR camera and another set is from the RGB camera.
Then you get the extrinsics for this set, in other words to the camera they were taken with. So, if you use the IR image set, you get ir_M_World
and the intrinsics is usually the camera matrix use in the pin hole camera model
s p = K [R|T] P
where s
is a scaling value, p
is 2d homgeneous point, K
is the camera matrix and [R|T]
is the extrinsics matrix (Rotation and translation) and P
is a 3D point.
Now, you need to understand a little how this calibration method works. First you have a grid of points (in your case the intersection of the chess squares. This points has to be represented in 3D coordinates. Since normally you do not care about where this points are (unless you have a fixed coordinate system you want to follow) the points are taken as follows:
[0,0,0] [1,0,0] ... [n, 0,0]
[0,1,0] ... ...
... ... ...
[0,m,0] ... [n, m, 0]
The [0,0,0] point may be in the center of the grid, and the steps between point can be real cm/mm/m measurements, but for convenience it is like that. Then you get the distance from this fake point cloud we created to one camera and then to the other camera. Since you have both distances from the same place you can relate them as explained in the blog post.
Back to your questions:
I have not used GML tool box, but I imagine they use the same idea as I explained above if this can not be set manually (sometimes it is only the step size between points that you can set). It seems also that they only use chess board patterns. If this process is fully automatic they use the same 3D points each time, so you can relate both cameras. I hope this helps you.