0

I need to group sets of transactions in different groups. My data in a text file as this format:

T1  17  20  22  35  37  60  62    
T2  39  51  53  54  57  65  73    
T3  17  20  21  22  34  37  62    
T4  20  22  54  57  65  73  45    
T5  20  54  57  65  73  75  80    
T6  2   20  54  57  59  63  71    
T7  2   20  22  57  59  71  66    
T8  17  20  28  29  30  34  35    
T9  16  20  28  32  54  57  65    
T10 16  20  22  28  57  59  71    
-    
-

and so on, over 5000 lines. Each line represents one transaction.

What I did so far:

txIn<-read.transactions("data2.txt",format="basket",sep=" ") 
d<-dissimilarity(txIn,method="Jaccard")
 library("cluster")
 clustersA<-pam(d,k=100)
 txOut <- paste("txOu", ".txt") 
write.table(clustersA$clustering, file="txOu",sep=" ")

but the file stores the transaction# with its cluster like:

"x"
"1" 1
"2" 1
"3" 1
"4" 1
"5" 1
"6" 2
"7" 2
"8" 2
"9" 1
"10" 2
-
-

and I need to save it as, for example:

cluster 1:

T1  17  20  22  35  37  60  62    
T2  39  51  53  54  57  65  73    
T3  17  20  21  22  34  37  62    
T4  20  22  54  57  65  73  45    
T5  20  54  57  65  73  75  80

T9  16  20  28  32  54  57  65

cluster 2:

T6  2   20  54  57  59  63  71    
T7  2   20  22  57  59  71  66    
T8  17  20  28  29  30  34  35        
T10 16  20  22  28  57  59  71    
    -
    -

and so on, because I want to deal with each cluster individually.

Please I have searched a lot, I need any information, example, doc, any help.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Meem
  • 47
  • 5

1 Answers1

0

Are you sure you want to do clustering?

To me, it sounds like you might be more interested in frequent itemset mining.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Yes, I want to cluster them. I tried to use clara for clustering, but the problem in clara is = it use distance fun to calculate the similarity, and as I believe we cant consider distance as similarity when we deal with data as transactions. – Meem Nov 10 '13 at 01:47
  • Your last sentence does not make sense to me. Clara can work with any dissimilarity. But you really need to consider *when* you want objects to cluster... this is not at all obvious with transactions, because say you have customer 1 buying `tomato spaghetti`, customer 2 buying `beer diapers` and customer 3 buying all four of these items. So how do you want to cluster them? Frequent itemset mining makes much more sense, as customer 3 can then include both frequent itemsets at the same time. – Has QUIT--Anony-Mousse Nov 10 '13 at 12:02
  • I got your point about frequent itemsets. it is interesting, so thanks a lot. I was trying to cluster the transactions in different groups, so I can apply association rules mining algorithms in each group individually. but I'm not sure about using clara in transactional data. here : http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/clara.html you can see: metric= "euclidean" or "manhattan. so how can we apply them in transactional data? like your example. because of that I used pam with Jaccard function. thanks a lot – Meem Nov 11 '13 at 00:25
  • I don't use R. I don't see why Clara should be restricted to these two distances, except that the authors of that R function didn't bother to implement others. R is, unfortunately, rather bad at re-using code. But maybe it will also accept a distance matrix? – Has QUIT--Anony-Mousse Nov 11 '13 at 07:38
  • Either way, R is quite limited and slow on such tasks. You may want to try ELKI which is much more flexible wrt. distance functions and often also much faster. It also has tons of clustering algorithms. You may be able to load this data set with `TermFrequencyParser`, then either use the `SparseVectorFieldFilter` (to make dense distance functions happy; it roughly means computing the maximum dimensionality) or use sparse distance functions. Jaccard might be similar to Cosine similarity on sparse vectors. Then try e.g. DBSCAN and OPTICS clustering. – Has QUIT--Anony-Mousse Nov 11 '13 at 07:43
  • thank you for introducing me to ELKI. I really appreciate your help. – Meem Nov 11 '13 at 20:25