I'm trying to use upsetplot for finding the intersection between column data in a dataframe. I am using a code from the one provided by the developers of this library, like the following:
import upsetplot
from upsetplot import from_indicators, plot
plot(from_indicators(indicators=pd.notna, data=data), show_counts=True)
plt.show()
So, this code above gave me a graph as an output with the counts of cell/pd_series in a df where is not empty (not a number). But I would like to have a code where instead of notna I could count the "core" items in all columns.
My code above would gave me from this dataframe (changed number to letters in this example):
-------column_1--column_2--column_3--column_4--column_5
row_1-- A -- A -- -- A --
row_2-- B -- -- B -- B --
row_3-- -- -- C -- --
row_4-- D -- D -- -- D --
row_5-- E -- -- E -- --
row_6-- -- -- -- -- F
...a graph sort of like this:
column_1 : **** (4 not_empty)
column_3, column_4 : *** (3 not_empty)
column_2 : ** (2 not_empty)
column_5 : * (1 not_empty)
But actually what I want is a graph with some information like this:
column_1, column_2, column_4 : ** (A, D in_common)
column_1, column_3, column_4 : * (B in_common)
column_1, column_3 : * (E in_common)
column_5 : - (F not_in_common)
Does any of you have some idea on how to change the "pd.notna" for another piece of code that could deliver what I'm looking for? Thanks in advance!