I have a pretty huge data set of products and users and their time of usage.
There are around 100 million rows and takes around 10 GB of disk space.
the dataset is on the following order:
userid itemid purchase_date
1 1 2018-12-22
11 1 2018-12-22
11 4 2018-12-22
12 4 2018-12-22
11 5 2018-12-22
.......100M+ rows.....
I also added such index,
ALTER TABLE purchase_data ADD INDEX (userid);
ALTER TABLE purchase_data ADD INDEX (itemid);
ALTER TABLE purchase_data ADD INDEX (purchase_date);
Let say I want to find all the users who purchased a product(item 1) and then find all the other item he purchased.
Select itemid from purchase_data
where userid in (Select userid, from purchase_data
where itemid=1)
and itemid!=1
this query takes forever to run.
Secondly, I also want to add up all the common items between those users like userid 11 4 and userid 12 also brought 4 so I want to add 4 with count 2
I wrote a similar query for that:
Select itemid,count(*) from purchase_data
where userid in (Select userid, from purchase_data
where itemid=1)
and itemid!=1
group by itemid
having count(itemid)>=1
This script also takes infinite time.
Please help,
Thank You