Agner's microarch PDF explains decoding, and what happens with multi-uop instructions.
If a multi-uop instruction isn't the first insn in the block being decoded, decoding ends at that insn. In the next cycle, decoding starts at the multi-uop insn, so it will hit the complex decoder that can handle multi-uop instructions.
This is why a 3-1-3-1 repeating pattern decodes better than a 3-3-1-1 repeating pattern.
The pre-decoders only mark instruction lengths/boundaries. They don't yet know which insns will decode to multiple uops. That requires actually decoding the instructions, so there's no way to shuffle the instruction stream around to send the complex instructions to the complex decoder.
This is why instruction ordering matters when you're bottlenecked on the decoders. For CPUs with a uop cache, decode performance isn't usually critical. If it is, you have a code-size issue. It's hopefully rare for code to run often enough for its performance to matter, but infrequently enough for it not to be hot in the uop cache.