Metric reconstruction is impossible with just camera intrinsics information.
For metric reconstruction either Camera's extrinsics should be known or you should've captured scene containing an object of known dimensions.
For more information, you can refer 10th chapter of Multi view Geometry by Hartley & Zisserman in Computer Vision. 10.2 and 10.4.2 sections talk about this problem clearly.
From 254p, second edition of the same textbook
Without some knowledge of a scene’s placement with respect to a 3D coordinate
frame, it is generally not possible to reconstruct the absolute position or orientation
of a scene from a pair of views (or in fact from any number of views). This is true
independently of any knowledge which may be available about the internal parameters
of the cameras, or their relative placement. For instance the exact latitude and longitude
of the scene in figure 9.8(p248) (or any scene) cannot be computed, nor is it possible to
determine whether the corridor runs north-south or east-west. This may be expressed
by saying that the scene is determined at best up to a Euclidean transformation (rotation
and translation) with respect to the world frame.