How to create a basket from a string in orange?

Question

There is an example in docs of how to find association rules in sample .basket file:

import Orange
data = Orange.data.Table("market-basket.basket")

rules = Orange.associate.AssociationRulesSparseInducer(data, support=0.3)

.basket file looks like this:

Bread, Milk
Bread, Diapers, Beer, Eggs
Milk, Diapers, Beer, Cola
Bread, Milk, Diapers, Beer
Bread, Milk, Diapers, Cola

I would like to use the same approach, but I do not know how to create a basket from my data. Is there a way to create a basket using Orange.data.Table from a string which contains the same data as a file?

Possible duplicate of http://stackoverflow.com/questions/8986719/how-do-i-create-a-new-data-table-in-orange — , Feb 28 '16 at 19:46
They use tables, but I would like to use baskets. @user1114496 suggested to save own data in a .basket file and then use it to create **Orange.data.Table**. I hope there is a more elegant way! — t1maccapp, Feb 29 '16 at 07:05
Basket is just a file format. What you want is a table with string meta attributes. Do you want to do it for Orange 2 or 3? This part really changed. — JanezD, Feb 29 '16 at 10:18
@JanezD I use orange 2 and I've already done my task using Orange.associate.AssociationRulesSparseInducer with a matrix table where each row has ones and zeros (which mean that a particular row has or has not a column feature). But this solution works really slow with my data (7000x16000 matrix table). I thought that if I create a 'basket'-table instead of matrix table it will run faster. — t1maccapp, Feb 29 '16 at 10:30
The slowness likely stems from the Apriori algorithm. Use [Orange3-Associate](https://pypi.python.org/pypi/Orange3-Associate). — K3---rnc, Feb 29 '16 at 13:58
Can you try 1's and undefineds (?) instead of 1's and 0's? Otherwise apriori also tries to construct rules like `a=1 and b=0 -> c=0`. — JanezD, Mar 01 '16 at 10:30

score 3 · Answer 1 · answered Feb 29 '16 at 13:57

3

In Orange 3, there is an add-on Orange3-Associate (the add-on doesn't require installing Orange 3), which you can use with your data in either numpy.array, scipy.sparse, or plain list of lists form.

http://orange3-associate.readthedocs.org/en/latest/

Also, the algorithm the add-on uses (FP-growth) is much faster than the one in Orange 2 (Apriori).

answered Feb 29 '16 at 13:57

K3---rnc

6,717
3
31
46

Thank you for suggesting to use FP-growth. I will try to use it with the next task! – t1maccapp Mar 04 '16 at 10:23

score 1 · Accepted Answer · answered Mar 01 '16 at 10:59

In Orange 2, this converts your string into sparse data table like the one you would get from a basket file.

import re
import Orange

word = re.compile("\w+")

s = """Bread, Milk
Bread, Diapers, Beer, Eggs
Milk, Diapers, Beer, Cola
Bread, Milk, Diapers, Beer
Bread, Milk, Diapers, Cola"""

all_items = set(word.findall(s))
domain = Orange.data.Domain([])
domain.add_metas({Orange.orange.newmetaid(): Orange.feature.Continuous(n)
                  for n in all_items}, True)

data = Orange.data.Table(domain)
for e in s.splitlines():
    ex = Orange.data.Instance(domain)
    for m in re.findall("\w+", e):
        ex[m] = 1
    data.append(ex)

It assumes that each item appears only once in a line. The last argument in add_metas, True, tells that these attributes are "optional". Without it, the matrix wouldn't be sparse.

This is exactly what I was looking for! :) Now when I use string meta attributes instead of matrix of one's and zero's it works sufficiently fast. Thanks for helping me with it! — t1maccapp, Mar 04 '16 at 10:27

How to create a basket from a string in orange?

2 Answers2