Instead of drawing the graph twice, which might come with some overhead, you could use matplotlib.rcParams['legend.handlelength'] = 0
. This is a global parameter, which means it would affect every other graph after the fact.
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
matplotlib.rcParams['legend.handlelength'] = 0
x = np.linspace(-np.pi/2, np.pi/2, 31)
y = np.cos(x)**3
# 1) remove points where y > 0.7
x2 = x[y <= 0.7]
y2 = y[y <= 0.7]
# 2) mask points where y > 0.7
y3 = np.ma.masked_where(y > 0.7, y)
# 3) set to NaN where y > 0.7
y4 = y.copy()
y4[y3 > 0.7] = np.nan
plt.plot(x*0.1, y, 'o-', color='lightgrey', label='No mask')
plt.plot(x2*0.4, y2, 'o-', label='Points removed')
plt.plot(x*0.7, y3, 'o-', label='Masked values')
plt.plot(x*1.0, y4, 'o-', label='NaN values')
plt.legend()
plt.title('Masked and NaN data')
plt.show()

If you want to only use it for one graph, you can wrap the code responsible for the graph with:
with plt.rc_context({"legend.handlelength": 0,}):
EDIT: the other answer has a better solution for per graph legends.