I wrote data mining apriori algorithm, it works well on small test data but I am having issue to run it on bigger data sets.
I am trying to generate rules of items which were bought together frequently.
My small test data is 5 transactions and 10 products.
My big test data is 11 million transactions and around 2700 products.
Problem: Min-support and Filter non frequent items.
Lets imagine we are interested in items which frequency is 60% or more.
frequency = 0.60;
When I compute Min-support
for a small data set with 60% frequency algorithm will remove all items which where bought less than 3 times. Min-support = numberOfTransactions * frequency;
But when I am trying to do the same thing for a large data set, algorithm will filter almost all item set after first iteration, just couple of items able to meet such plane.
So I've started decreasing that plane lower and lower, running algorithm many times. But not even 5% giving desired results. I had to lower my frequency percents until 0.0005 to get it at least 50% of items involved in first iteration.
What do you think about current situation is it might be a data problem, since it is generated artificially? (Microsoft adventure works version) Or it is my code or min support computation problems?
Maybe you can offer any other solution or better way of doing this?
Thanks!