0

I'm trying to apply frequent pattern mining(FPM) algorithms on a biological data,where rows represent samples, and columns represent SNPs(location,position), i'm working on jupyter notebook:

first i imported necessary packages:

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import fpgrowth

Then i manpulated my .vcf files to extract SNPs for each patient(sample) and encode the data in a boolean format as required by the different FPM algorithms so that the data lastly are in the form of

ID chr1-1237 chr1-156790 chr2-5467878
sample1 True False True
sample2 False False True
sample3 True True False
:::::
sample23 False True True

where rows are the samples and columns are the SNPs that each patient has the actual dataset is 23 x 305269, and whenever trying to generate frequent patterns using the following piece of code:

#Generate the frequent itemsets using apriori
frequent_itemsets = apriori(Samples_encoded, min_support=0.7, use_colnames=True)\
.sort_values("support",ascending=False)
frequent_itemsets

OR

#Generate the frequent itemsets using FP-growth
fpgrowth(Samples_encoded, min_support=0.6, use_colnames=True)

Always end up with the following error msg: the kernel appears to have died. it will restart automatically.

Is this because I have too many columns ?? is there an algorithm that can solve this issue ?? Or is this is a RAM problem ? Should I upgrade my RAM i already have 16 GB RAM

0 Answers0