I guess it would require a full book to explain everything to you in detail. So better read one. For starts maybe I can give you a few hints.
1) Is that all depth data is? Pixel (x, y) - distance in mm. Or is
there more to depth data.
Yes that's all. If you have distances with units or just some sensor value depends on your depth sensing device and if there is any calibration step.
2) How is depth data represented in a file
Usually as a sequence of values. If depth information is 2-dimensional you usually have a depth map which is basically a digital image. Instead of brightness or colour informatino you have distances. This is too broad to be answered.
3) What all can it be used for?
For anything that utilizes depth information. This is too broad to be answered.
4) What is the most efficient way to save this data to file
That would depend on your depth resolution and many other factors like if you want to store brightness or colour as well. This is too broad to be answered.
5) If saving 20 second video with depth data, wouldn't it get very
large and very slow to process? how to handle this
If you only capture depth there is no difference to a grayscale video, given that you have the same quantization. What do you consider very large? What is a very slow process? This is too broad to be answered.