1

There is an example in docs of how to find association rules in sample .basket file:

import Orange
data = Orange.data.Table("market-basket.basket")

rules = Orange.associate.AssociationRulesSparseInducer(data, support=0.3) 

.basket file looks like this:

Bread, Milk
Bread, Diapers, Beer, Eggs
Milk, Diapers, Beer, Cola
Bread, Milk, Diapers, Beer
Bread, Milk, Diapers, Cola

I would like to use the same approach, but I do not know how to create a basket from my data. Is there a way to create a basket using Orange.data.Table from a string which contains the same data as a file?

t1maccapp
  • 323
  • 7
  • 16
  • Possible duplicate of http://stackoverflow.com/questions/8986719/how-do-i-create-a-new-data-table-in-orange –  Feb 28 '16 at 19:46
  • They use tables, but I would like to use baskets. @user1114496 suggested to save own data in a .basket file and then use it to create **Orange.data.Table**. I hope there is a more elegant way! – t1maccapp Feb 29 '16 at 07:05
  • Basket is just a file format. What you want is a table with string meta attributes. Do you want to do it for Orange 2 or 3? This part really changed. – JanezD Feb 29 '16 at 10:18
  • @JanezD I use orange 2 and I've already done my task using Orange.associate.AssociationRulesSparseInducer with a matrix table where each row has ones and zeros (which mean that a particular row has or has not a column feature). But this solution works really slow with my data (7000x16000 matrix table). I thought that if I create a 'basket'-table instead of matrix table it will run faster. – t1maccapp Feb 29 '16 at 10:30
  • The slowness likely stems from the Apriori algorithm. Use [Orange3-Associate](https://pypi.python.org/pypi/Orange3-Associate). – K3---rnc Feb 29 '16 at 13:58
  • Can you try 1's and undefineds (?) instead of 1's and 0's? Otherwise apriori also tries to construct rules like `a=1 and b=0 -> c=0`. – JanezD Mar 01 '16 at 10:30

2 Answers2

3

In Orange 3, there is an add-on Orange3-Associate (the add-on doesn't require installing Orange 3), which you can use with your data in either numpy.array, scipy.sparse, or plain list of lists form.

http://orange3-associate.readthedocs.org/en/latest/

Also, the algorithm the add-on uses (FP-growth) is much faster than the one in Orange 2 (Apriori).

K3---rnc
  • 6,717
  • 3
  • 31
  • 46
1

In Orange 2, this converts your string into sparse data table like the one you would get from a basket file.

import re
import Orange

word = re.compile("\w+")

s = """Bread, Milk
Bread, Diapers, Beer, Eggs
Milk, Diapers, Beer, Cola
Bread, Milk, Diapers, Beer
Bread, Milk, Diapers, Cola"""

all_items = set(word.findall(s))
domain = Orange.data.Domain([])
domain.add_metas({Orange.orange.newmetaid(): Orange.feature.Continuous(n)
                  for n in all_items}, True)

data = Orange.data.Table(domain)
for e in s.splitlines():
    ex = Orange.data.Instance(domain)
    for m in re.findall("\w+", e):
        ex[m] = 1
    data.append(ex)

It assumes that each item appears only once in a line. The last argument in add_metas, True, tells that these attributes are "optional". Without it, the matrix wouldn't be sparse.

JanezD
  • 534
  • 3
  • 9
  • This is exactly what I was looking for! :) Now when I use string meta attributes instead of matrix of one's and zero's it works sufficiently fast. Thanks for helping me with it! – t1maccapp Mar 04 '16 at 10:27