I'm trying to use hypothesis to generate pandas dataframes where some column values are dependant on other column values. So far, I haven't been able to 'link' two columns.
This code snippet:
from hypothesis import strategies as st
from hypothesis.extra.pandas import data_frames , column, range_indexes
def create_dataframe():
id1 = st.integers().map(lambda x: x)
id2 = st.shared(id1).map(lambda x: x * 2)
df = data_frames(index = range_indexes(min_size=10, max_size=100), columns=[
column(name='id1', elements=id1, unique=True),
column(name='id2', elements=id2),
])
return df
Produces a dataframe with a static second column:
id1 program_id
0 1.170000e+02 110.0
1 3.600000e+01 110.0
2 2.876100e+04 110.0
3 -1.157600e+04 110.0
4 5.300000e+01 110.0
5 2.782100e+04 110.0
6 1.334500e+04 110.0
7 -3.100000e+01 110.0