0

I would like to define a strategy to generate multiple pandas columns which are row-wise unique.

For example, the following two columns would be unique, as there are no duplicates of the two columns combined, even though there are duplicates within the columns themselves.

>>> c0 c1
0   1  1
1   1  2
3   2  2

These columns, however, would not be unique in this sense:

>>> c0 c1
0   3  1
1   2  2
3   1  3

This is possible for a single column with the unique kwarg -- however it is not obvious how to generate multiple columns that would be unique. This would be useful for generating a multiindex for example. Is there a good ready-made workaround that anyone is aware of?

Kosmonaut
  • 128
  • 10
  • When you say "multiple columns that would be unique", unique with respect to what? No column is equal to another column? No column contains any element which is in another column? Something else? – Zac Hatfield-Dodds Apr 12 '21 at 06:31
  • Good point -- I have clarified what I meant in the question. – Kosmonaut Apr 13 '21 at 07:46

1 Answers1

3

Based on the examples in your question, I think you mean "columns such that there is no row which is a permutation of any other row".

(the simpler "such that there are no duplicate rows" is also satsified by your second example)

In this case, I'd probably turn to the basic lists() strategy:

lists(
    elements=tuples(integers(), integers()),  # elements for each column
    unique_by=lambda row: tuple(sorted(row))  # or otherwise canonicalise
).map(turn_into_a_dataframe)
Zac Hatfield-Dodds
  • 2,455
  • 6
  • 19