In forward rendering, you have to re-render the entire scene a bunch of times. In deferred rendering, you render the scene once and that's that.
Re-rendering the entire scene means re-doing a bunch of work, including but not limited to:
- CPU processing of the scene. That is, walking the scene graph and issuing rendering commands.
- GPU processing of state changes between drawing commands..
- GPU reading vertex arrays for meshes.
- Vertex shader execution.
All that only has to happen one time in deferred rendering.
And don't forget that "early-z" isn't free. Not only do those triangles have to be generated, but they have to be rasterized too. They have to get far enough into the processing pipeline that they can be culled. It's not a huge deal, but it's not nothing either.
So yes, a depth pre-pass in forward rendering means you're only running the FS when absolutely necessary. But you're still running a whole bunch of unnecessary stuff.