An orthographic camera renders everything in the view without any distance scaling. For example, a unit sphere is the same size in the render whether it is 1 or 1,000 units from the camera. There can be confusion about an orthographic camera being a 2D camera, however it is not. It is still a camera that renders in 3D and so it will take into consideration its own position and angle.
You have your camera looking down on the table. When you are using the perspective camera, everything looks normal to the eye. However, when you use the orthographic camera the perspective is taken out of the render and confuses the eye.
Here is an example. These pictures come from a scene with three identically scaled cubes in a row and two cameras. The cubes are rotated 30, 60 and 90 degrees around the x-axis. Both cameras are in the same position. One is an orthographic camera and the other is a perspective camera.
This image is with the perspective camera (the left half is the Scene view, the right half is the Game view camera render). The eye can interpret that the cubes could be rotated because of the perspective, even though two of the cubes are oddly stretched.

This image is with the orthographic camera (as before, the left half is the Scene view, the right half is the Game view camera render). The eye cannot interpret that the cubes are rotated because of the lack of perspective. The cubes look to be shortened left to right.

(Note: In the above images, I purposely removed lighting cues.)
So, that is what is happening in your scene when using the orthographic camera. Your objects are not being scaled. The camera is rendering what is sees based on its own position and angle. Placing an orthographic camera is different than placing a perspective camera. I would suggest moving the camera down by reducing y-axis position and shallowing the view angle by reducing the x-axis rotation. You will need to play with the values to get right effect and view.