I have a dict, i want to extract data after i did a profiling using pandas-profiling. i am trying to get the data for got gfcid? i tried to see what are the keys() and it return 4 keys.
dict_keys(['table', 'variables', 'freq', 'correlations'])
'variables': count distinct_count p_missing n_missing p_infinite n_infinite \
prf_product 200 2 0 0 0 0
gfcid 61 2 0.695 139 0 0
arrg_id 200 182 0 0 0 0
for an example, i would want to fetch gfcid p_missing which its value is 139, the problem i uncounted is that there are a number of gfcid column that i get, what is the best approach i should act on to get this data out .
{'table': {'n': 200,
'nvar': 3,
'total_missing': 0.23166666666666666,
'n_duplicates': 18,
'memsize': '5.0 KiB',
'recordsize': '25.6 B',
'NUM': 1,
'DATE': 0,
'CONST': 0,
'CAT': 1,
'UNIQUE': 0,
'CORR': 0,
'RECODED': 0,
'BOOL': 1,
'UNSUPPORTED': 0,
'REJECTED': 0},
'variables': count distinct_count p_missing n_missing p_infinite n_infinite \
prf_product 200 2 0 0 0 0
gfcid 61 2 0.695 139 0 0
arrg_id 200 182 0 0 0 0
is_unique mode p_unique memorysize ... \
prf_product False Overdraft 0.01 1728 ...
gfcid False 1022506923 0.01 1928 ...
arrg_id False 458040000000328871 0.91 1728 ...
iqr kurtosis skewness sum \
prf_product NaN NaN NaN NaN
gfcid NaN NaN NaN NaN
arrg_id 1.97939e+18 -2.01933 0.000718585 -5533519320767099939
mad cv n_zeros p_zeros \
prf_product NaN NaN NaN NaN
gfcid NaN NaN NaN NaN
arrg_id 9.90032e+17 0.685487 0 0
histogram \
prf_product NaN
gfcid NaN
arrg_id data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...
mini_histogram
prf_product NaN
gfcid NaN
arrg_id data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...
[3 rows x 34 columns],
'freq': {'prf_product': Overdraft 100
Retails Cards 100
Name: prf_product, dtype: int64, 'gfcid': 1022506923 61
Name: gfcid, dtype: int64, 'arrg_id': 458040000001206947 2
458040000003802902 2
458040000003898582 2
458040000003488662 2
2409124554515908929 2
..
458040000000500916 1
2422373310444855111 1
458040000002710689 1
2459484652984972989 1
458040000000940444 1
Name: arrg_id, Length: 182, dtype: int64},
'correlations': {'pearson': gfcid arrg_id
gfcid NaN NaN
arrg_id NaN 1.0, 'spearman': gfcid arrg_id
gfcid NaN NaN
arrg_id NaN 1.0}}
I did this to fetch and realised it is returning in the first line
desc = profile.get_description()
result = []
for i in desc.values():
print(i.values)
this is what it returns, is there a way we could extract this data out? i did a i[0] and it is returning error
<built-in method values of dict object at 0x7f2db8e31318>
[[200 2 0.0 0 0.0 0 False 'Overdraft' 0.01 1728 'Overdraft' 100 'CAT' nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan]
[61 2 0.6950000000000001 139 0.0 0 False 1022506923 0.01 1928 1022506923
61 'BOOL' 1022506923.0 nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan]
[200 182 0.0 0 0.0 0 False 458040000000328871 0.91 1728 nan nan 'NUM'
1.4480719292929288e+18 9.926344464280993e+17 9.853231442356191e+35']]
<built-in method values of dict object at 0x7f2db8e31e10>
<built-in method values of dict object at 0x7f2db8e4e1f8>