Hypothesis strategy for multiple pandas series/columns with no duplicates

Question

I would like to define a strategy to generate multiple pandas columns which are row-wise unique.

For example, the following two columns would be unique, as there are no duplicates of the two columns combined, even though there are duplicates within the columns themselves.

These columns, however, would not be unique in this sense:

This is possible for a single column with the unique kwarg -- however it is not obvious how to generate multiple columns that would be unique. This would be useful for generating a multiindex for example. Is there a good ready-made workaround that anyone is aware of?

When you say "multiple columns that would be unique", unique with respect to what? No column is equal to another column? No column contains any element which is in another column? Something else? — Zac Hatfield-Dodds, Apr 12 '21 at 06:31
Good point -- I have clarified what I meant in the question. — Kosmonaut, Apr 13 '21 at 07:46

score 3 · Accepted Answer · answered Apr 14 '21 at 00:48

Based on the examples in your question, I think you mean "columns such that there is no row which is a permutation of any other row".

(the simpler "such that there are no duplicate rows" is also satsified by your second example)

In this case, I'd probably turn to the basic lists() strategy:

lists(
    elements=tuples(integers(), integers()),  # elements for each column
    unique_by=lambda row: tuple(sorted(row))  # or otherwise canonicalise
).map(turn_into_a_dataframe)

great thanks! i appreciate the work yous are putting into this project! :) — Kosmonaut, Apr 14 '21 at 19:19

Hypothesis strategy for multiple pandas series/columns with no duplicates

1 Answers1