I'm trying to apply frequent pattern mining(FPM) algorithms on a biological data,where rows represent samples, and columns represent SNPs(location,position), i'm working on jupyter notebook:
first i imported necessary packages:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import fpgrowth
Then i manpulated my .vcf
files to extract SNPs
for each patient(sample) and encode the data in a boolean format as required by the different FPM algorithms so that the data lastly are in the form of
ID | chr1-1237 | chr1-156790 | chr2-5467878 |
---|---|---|---|
sample1 | True | False | True |
sample2 | False | False | True |
sample3 | True | True | False |
::::: | |||
sample23 | False | True | True |
where rows are the samples and columns are the SNPs that each patient has the actual dataset is 23 x 305269, and whenever trying to generate frequent patterns using the following piece of code:
#Generate the frequent itemsets using apriori
frequent_itemsets = apriori(Samples_encoded, min_support=0.7, use_colnames=True)\
.sort_values("support",ascending=False)
frequent_itemsets
OR
#Generate the frequent itemsets using FP-growth
fpgrowth(Samples_encoded, min_support=0.6, use_colnames=True)
Always end up with the following error msg:
the kernel appears to have died. it will restart automatically.
Is this because I have too many columns ?? is there an algorithm that can solve this issue ?? Or is this is a RAM problem ? Should I upgrade my RAM i already have 16 GB RAM