2

I'm trying to find a substring in a frozenset, however I'm a bit out of options.

My data structure is a pandas.dataframe (it's from the association_rules from the mlxtend package if you are familiar with that one) and I want to print all the rows where the antecedents (which is a frozenset) include a specific string.

Sample data: enter image description here

    print(rules[rules["antecedents"].str.contains('line', regex=False)])

However whenever I run it, I get an Empty Dataframe.

When I try running only the inner function on my series of rules["antecedents"], I get only False values for all entries. But why is that?

smci
  • 32,567
  • 20
  • 113
  • 146
ch1ll
  • 419
  • 7
  • 20
  • Have you tried `rules[rules["antecedents"].str.contains('line', regex=False) == True]`? – Alejandro Alcalde Mar 28 '19 at 16:30
  • Because `rules["antecedents"]` is not a string, but a frozenset. Hence `rules["antecedents"].str` is a wrong attempt to get a string representation of a frozenset, which as @adrtam says only gives NaN (arguably it should raise Exception). Since you didn't supply an MCVE, can you please dump the actual value of `rules["antecedents"]`? – smci Mar 28 '19 at 17:53
  • @ElBaulP yes i did, found this answer in another stackoverflow post, sorry for not mentioning. To me it seems that this does the exact same thing. @smci The actual type of `rules["antecedents"]` is `pandas.core.series.Series`. What does MCVE stand for? I couldn't find a viable result on google. And thank you for prettifying my question. – ch1ll Mar 28 '19 at 20:09

1 Answers1

4

Because dataframe.str.* functions are for string data only. Since your data is not string, it will always be NaN regardless the string representation of it. To prove:

>>> x = pd.DataFrame(np.random.randn(2, 5)).astype("object")
>>> x
         0         1         2          3          4
0 -1.17191  -1.92926 -0.831576 -0.0814279   0.099612
1 -1.55183 -0.494855   1.14398   -1.72675 -0.0390948
>>> x[0].str.contains("-1")
0   NaN
1   NaN
Name: 0, dtype: float64

What can you do:

Use apply:

>>> x[0].apply(lambda x: "-1" in str(x))
0    True
1    True
Name: 0, dtype: bool

So your code should write:

print(rules[rules["antecedents"].apply(lambda x: 'line' in str(x))])

You might want to use 'line' in x if you mean an exact match on element

smci
  • 32,567
  • 20
  • 113
  • 146
adrtam
  • 6,991
  • 2
  • 12
  • 27
  • Thank you for this! I stumbled upon a similar solution today but looks like I got something wrong. This kind of applying lambda functions to dataframes looks very powerful, maybe I should look into that a bit more. – ch1ll Mar 28 '19 at 20:11