I would like use patsy's dmatrix
function to generate a design matrix in which rows with NaN values are preserved. For example, the following code would return a design matrix with four rows, which is what we would normally want. However, in this case I would like dmatrix
to return a matrix with five rows, where the first row will have an NaN value in it.
import numpy as np
import pandas as pd
from patsy import dmatrix
df = pd.DataFrame({'x1': np.arange(5), 'x2': np.arange(5)})
dmatrix("~x1+x2.diff()", df)
Alternatively, I would settle for an answer that allows me to retrieve the row numbers that were dropped / retained. In the example above row 1 is the row that was dropped, while rows 2-5 were retained.