Python performing t-test only on pairs

Question

hi a while back i got help to make this function, but im royally stuck now.

        from scipy.stats import ttest_ind
    def input_file_to_dict(f):
            return dict((key, int(value)) for value, key in map(lambda line:line.split(), f))

    with open("count-pos.txt") as f:
            word_counts1 = input_file_to_dict(f)

    with open("count-neg.txt") as f:
            word_counts2 = input_file_to_dict(f)

find all words that are in list1 and in list2

    out = open('t-test_output.txt', 'w')
    common_words = set.intersection(set(word_counts1.keys()),    set(word_counts2.keys()))
    for line in common_words:

        t,p = ttest_ind([word_counts1[k] for k in common_words], [word_counts2[k] for k in common_words])

        print >> out, (t,p)

As one can see Im trying to compare two lists that contain frequency of words, however some words dont appear in both sample sizes. I wish to perform a t-test on each word pair, to determine their variance. However, This is giving me the same t-value and p-value pair over and over again.

Anyone got some ideas?

Example files look like this: count-pos.txt

529 the
469 want
464 it
449 de

score 0 · Answer 1 · answered Jun 06 '13 at 16:39

0

This line is calculating the same values each time in your loop because you pass in the counts for all common_words every time:

t,p = ttest_ind([word_counts1[k] for k in common_words], [word_counts2[k] for k in common_words])

Do you need to loop through all the common_words?

answered Jun 06 '13 at 16:39

Brent Washburne

12,904
4
60
82

yeh it needs to loop through all the common_words – RHK-S8 Jun 06 '13 at 17:56

Python performing t-test only on pairs

1 Answers1