Unsupervised discretization to convert continuous into categorical for frequent item set mining

Question

I am using the Package ‘arules’ to mine frequent itemsets in my big data, but I cannot find suitable methods for discretization.

As the example in Package ‘arules’, several basic unsupervised methods can be used in the function ‘discretization’, but I want to estimate optimal number of categories in my large dataset, it seems more reasonable than assigning the number of categories.

Can you give me good advices for this, thanks.

@Michael Hahsler

score 0 · Accepted Answer · answered Jan 31 '18 at 16:44

0

I think there is little guidance on this for unsupervised discretization. Look at the histogram for each variable and decide manually. For k-means you could potentially use strategies to find k using internal validation techniques (i.e., elbow method). For supervised discretization there exist methods that will help you decide. Maybe someone else can help here.

answered Jan 31 '18 at 16:44

Michael Hahsler

2,965
1
12
16

Thanks for reply, my data is too big, so when I use the k-means, there have Warning message: Quick-TRANSfer stage steps exceeded maximum (= 93441300) – Pan Jan 31 '18 at 18:06
Take a sample, apply k-means discretization with `onlycuts=TRUE` and then used the `fixed` method with the returned cuts on all the data. – Michael Hahsler Feb 01 '18 at 18:55
Thanks for reply, in your method, I must estimate the optimal number of categories, am I right? – Pan Feb 01 '18 at 21:36
Yes, you have to specify the number. – Michael Hahsler Feb 02 '18 at 21:55

Unsupervised discretization to convert continuous into categorical for frequent item set mining

1 Answers1