1

Objective: to run association rules on a binary values dataset

d = {'col1': [0, 0,1], 'col2': [1, 0,0], 'col3': [0,1,1]}
df = pd.DataFrame(data=d)

This produces a data frame with 0's and 1's for corresponding column values.

The problem is when I make use of code like the following:

from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
frequent_itemsets = apriori(pattern_dataset, min_support=0.50,use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules

Typically this runs just fine, but in running it this time I have encountered an error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-61-46ec6f572255> in <module>()
      4 frequent_itemsets = apriori(pattern_dataset, min_support=0.50,use_colnames=True)
      5 frequent_itemsets
----> 6 rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
      7 rules

D:\AnaConda\lib\site-packages\mlxtend\frequent_patterns\association_rules.py in association_rules(df, metric, min_threshold, support_only)
    127     values = df['support'].values
    128     frozenset_vect = np.vectorize(lambda x: frozenset(x))
--> 129     frequent_items_dict = dict(zip(frozenset_vect(keys), values))
    130 
    131     # prepare buckets to collect frequent rules

D:\AnaConda\lib\site-packages\numpy\lib\function_base.py in __call__(self, *args, **kwargs)
   1970             vargs.extend([kwargs[_n] for _n in names])
   1971 
-> 1972         return self._vectorize_call(func=func, args=vargs)
   1973 
   1974     def _get_ufunc_and_otypes(self, func, args):

D:\AnaConda\lib\site-packages\numpy\lib\function_base.py in _vectorize_call(self, func, args)
   2040             res = func()
   2041         else:
-> 2042             ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
   2043 
   2044             # Convert args to object arrays first

D:\AnaConda\lib\site-packages\numpy\lib\function_base.py in _get_ufunc_and_otypes(self, func, args)
   1996             args = [asarray(arg) for arg in args]
   1997             if builtins.any(arg.size == 0 for arg in args):
-> 1998                 raise ValueError('cannot call `vectorize` on size 0 inputs '
   1999                                  'unless `otypes` is set')
   2000 

ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set

This is what I have for dtypes in Pandas, any help would be appreciated.

col1    int64
col2    int64
col3    int64
dtype: object
hpaulj
  • 221,503
  • 14
  • 230
  • 353
Student
  • 1,197
  • 4
  • 22
  • 39
  • Well, in this case, the result at the frequent_itemset were null at min_support=0.50. Altering to a lower value allows for results for the association rules to be applied to. – Student Oct 27 '18 at 21:18

2 Answers2

3
    128     frozenset_vect = np.vectorize(lambda x: frozenset(x))
--> 129     frequent_items_dict = dict(zip(frozenset_vect(keys), values))

Here np.vectorize wraps the frozenset(x) function in code that can take an array or list (keys), and pass each element for evaluation. It a kind of numpy iteration (convenient, but not fast). But to determine what kind (dtype) of array it returns it performs a test run with the first element of keys. An alternative to doing this test run is to use the otypes parameter.

Anyways, in this particular run, keys is evidently empty, a 0 size array or list. It could return an equivalent shape result array, but it still has to set a dtype. Hence the error.

Evidently the code writer never anticipated the case where keys was empty. So you need to tackle the question of why is it empty?

We need to look at the association_rules code see how keys is set. Its use in line 129 suggests that it has the same number of elements as values, which is derived from the df with:

values = df['support'].values

If keys has 0 elements, then values does as well, and df has 0 'rows'.

What the size of frequent_itemsets?

I add a mlxtend tag because the error arises during the use of its code. You/we need to examine that code or its documentation to determine why this dataframe is empty.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 1
    Thanks, @hpaulj, this issue is now raised on the developer's GITHUB: [rasbt](https://github.com/rasbt/mlxtend/issues/496). I hope your proposed solution helps solve the issue. – Taiwotman Feb 28 '19 at 16:43
  • downgrading pandas 1.1.0 to 1.0.1 solved the issue for me (numpy version is 1.19.1) – ismail Jul 30 '20 at 12:32
2

Workaround:

def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

yourdataset_sets = yourdataset.applymap(encode_units)

frequent_itemsets = apriori(yourdataset_sets, min_support=0.001, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

Credit: saeedesmaili

Taiwotman
  • 885
  • 14
  • 27