Have a problem computing pearson p-value for dict values

Question

I want to compute the pearson p value for values of two dictionaries using for loop. Dictionaries represent the data of two dataframes one of which has some changes. The dictionaries contain the info on the name of columns, the keys and the histogram values for each column. So basically I want to compute the p values for each column for these two dictionaries

both of the dictionaries have the following structure:

{'columnname1': {'keys': [0, 46.72, 50], 'values': [41, 13, 23, 21...0, 0, 1]},
'columnname2': {'keys': [0, 20, 50], 'values': [21, 43, 25, 2...0, 3, 15},...}

To compute the p-value for each column I tried to do the next function:

    def ChiTest(hist_1, hist_2):
    hist = {}
    for column1 in hist_1.keys():
        for column2 in hist_1.keys():
            hist[column1] = {}
            hist[column1]['keys'] = hist_2[column2]['keys']
            hist[column1]['pearson'] = pearsonr(hist_1[column1]['values'], hist_2[column2]['values'])
    return (hist)


test = ChiTest(one, two)

The hist[column]['keys'] work well but the hist[column]['pearson'] = pearsonr(hist_2[column]['values'], hist_1[column]['values']) raise the KeyError message

KeyError: 'values'

And I can't figure out what have I missed. Any help is appreciated.

@Ison, you may have to put a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). For example, how does 'ShortData' come in?. Can the code you have put here be used to debug this issue? — AAA, Jul 12 '19 at 10:53
@AAA. ShortData == keys. fixed it in post. IRL one dict uses ShortData name and another keys. Both mean "keys". So yes, the code can be used in gebugging — Ison, Jul 12 '19 at 10:57
@Ison the code above works with no error: see [here](https://repl.it/repls/MiniBeautifulConferences) — AAA, Jul 12 '19 at 11:02
@AAA. Hmmm, seems problem is not in the func. Ok...Thanks for help! — Ison, Jul 12 '19 at 11:06
@Ison, probably had to do with you using `column` in both for loops — AAA, Jul 12 '19 at 11:07

Jay · Answer 1 · 2019-07-12T11:07:53.283

1

Well, now the original answer is obsolete. What keys do you want in hist output? This is probably wrong; what do you want hist to return?

for column1 in hist_1.keys():
    for column2 in hist_2.keys():
        hist[(column1, column2)] = {}
        hist[(column1, column2)]['keys'] = hist_2[column2]['keys']
        hist[(column1, column2)]['pearson'] = pearsonr(hist_2[column2]['values'], hist_1[column1]['values'])

(Not clear what you are trying to get to :))

edited Jul 12 '19 at 11:07

answered Jul 12 '19 at 10:27

Jay

3,203
2
25
31

@Ison: Still using "column" in both loops; that can't work; column will always be a key of hist_2, but not hist_1 inside the inner loop – Jay Jul 12 '19 at 10:40

Have a problem computing pearson p-value for dict values

1 Answers1