I feel like Masking() is more masking of time steps; while Embedding(mask_zero=True) is more of a data filter.
Masking:
If all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers
With an arbitrary mask_value. Thus, you can decide to skip time steps in which there is no input, or some other condition you can think of, based on your data.
For Embedding, you overlay a mask on your input skipping calculations for data for which the input=0. This way, you can, in a single time step, propagate full data, part of the data, of no data through the network. This is not a masking of time step #3 or something like that, it is a masking of input data #i. Also, only having no input (input=zero) can be masked.
Thus, there are certainly cases I can think of where the two are completely equal (when an input = 0, it is 0 for all inputs would be such a case), but their use is on another resolution.