{'SYMBOL': {0: 'BAF180', 1: 'ACTL6A', 2: 'DMAP1', 3: 'C1orf149', 4: 'YEATS4'}, 'Gene Name(s)': {0: ';PB1;BAF180;MGC156155;MGC156156;PBRM1;', 1: ';ACTL6A;ACTL6;BAF53A;MGC5382;', 2: ';DMAP1;DKFZp686L09142;DNMAP1;DNMTAP1;FLJ11543;KIAA1425;EAF2;SWC4;', 3: ';FLJ11730;CDABP0189;C1orf149;NY-SAR-91;RP3-423B22.2;Eaf6;', 4: ';YEATS4;4930573H17Rik;B230215M10Rik;GAS41;NUBI-1;YAF9;'}, 'Description': {0: 'polybromo 1', 1: 'BAF complex 53 kDa subunit|BAF53|BRG1-associated factor|actin-related protein|hArpN beta; actin-like 6A', 2: 'DNA methyltransferase 1 associated protein 1; DNMT1 associated protein 1', 3: 'hypothetical protein LOC64769|sarcoma antigen NY-SAR-91; chromosome 1 open reading frame 149', 4: 'NuMA binding protein 1|glioma-amplified sequence-41; YEATS domain containing 4'}, 'G.O. PROCESS': {0: 'Transcription', 1: 'Transcription', 2: 'Transcription', 3: 'Transcription', 4: 'Transcription'}, 'TurboSEQUESTScore': {0: 70.29, 1: 80.29, 2: 34.18, 3: 30.32, 4: 40.18}, 'Coverage %': {0: 6.7, 1: 28.0, 2: 10.7, 3: 24.2, 4: 21.1}, 'KD': {0: 183572.3, 1: 47430.4, 2: 52959.9, 3: 21501.9, 4: 26482.7}, 'Genebank Accession no': {0: 30794372, 1: 4757718, 2: 13123776, 3: 29164895, 4: 5729838}, 'MS/MS Peptide no.': {0: '9 (9 0 0 0 0)', 1: '9 (9 0 0 0 0)', 2: '4 (3 0 0 1 0)', 3: '3 (3 0 0 0 0)', 4: '4 (4 0 0 0 0)'}}
I would want to detect and remove outliers on the column TurboSEQUESTScore
using 3 times of standard deviation as the threshold for outliers How can I go about it? This is what i have tried.
The name of dataframe is rename_df
z_scores = stats.zscore(rename_df['TurboSEQUESTScore'])
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=None)
I don't seem to solve this properly.