Frequent pattern mining on a large dataset with respect to columns (data dimension= 23 x 305269) always result in a dead kernel error

Question

I'm trying to apply frequent pattern mining(FPM) algorithms on a biological data,where rows represent samples, and columns represent SNPs(location,position), i'm working on jupyter notebook:

first i imported necessary packages:

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import fpgrowth

Then i manpulated my .vcf files to extract SNPs for each patient(sample) and encode the data in a boolean format as required by the different FPM algorithms so that the data lastly are in the form of

ID	chr1-1237	chr1-156790	chr2-5467878
sample1	True	False	True
sample2	False	False	True
sample3	True	True	False
:::::
sample23	False	True	True

where rows are the samples and columns are the SNPs that each patient has the actual dataset is 23 x 305269, and whenever trying to generate frequent patterns using the following piece of code:

#Generate the frequent itemsets using apriori
frequent_itemsets = apriori(Samples_encoded, min_support=0.7, use_colnames=True)\
.sort_values("support",ascending=False)
frequent_itemsets

OR

#Generate the frequent itemsets using FP-growth
fpgrowth(Samples_encoded, min_support=0.6, use_colnames=True)

Always end up with the following error msg: the kernel appears to have died. it will restart automatically.

Is this because I have too many columns ?? is there an algorithm that can solve this issue ?? Or is this is a RAM problem ? Should I upgrade my RAM i already have 16 GB RAM

Frequent pattern mining on a large dataset with respect to columns (data dimension= 23 x 305269) always result in a dead kernel error

0 Answers0