0

I've been trying to implement a t_student test in a DataFrame but I always end up with an error like

    raise KeyError(key)
KeyError: 'patid'

This is my DataFrame:


df = pd.DataFrame.from_records(data=[
  dict(id=1, rd=True, drk=True, hn=True, ch="a", age="40+", sex="m"),
  dict(id=2, rd=True, drk=True, hn=True, ch="a", age="40-", sex="m"),
  dict(id=3, rd=True, drk=True, hn=True, ch="a", age="40+", sex="m"),
  dict(id=4, rd=True, drk=True, hn=True, ch="a", age="40-", sex="m"),
  dict(id=5, rd=True, drk=True, hn=True, ch="a", age="40+", sex="m"),
  dict(id=6, rd=True, drk=True, hn=True, ch="a", age="40-", sex="m"),
  dict(id=1, rd=True, drk=True, hn=True, ch="b", age="40+", sex="m"),
  dict(id=2, rd=True, drk=True, hn=True, ch="b", age="40-", sex="m"),
  dict(id=3, rd=True, drk=True, hn=True, ch="b", age="40+", sex="m"),
  dict(id=4, rd=True, drk=True, hn=True, ch="b", age="40-", sex="m"),
  dict(id=5, rd=True, drk=True, hn=True, ch="b", age="40+", sex="m"),
  dict(id=6, rd=True, drk=True, hn=True, ch="b", age="40-", sex="m"),
  dict(id=10, rd=True, drk=True, hn=True, ch="a", age="40+", sex="f"),
  dict(id=20, rd=True, drk=True, hn=True, ch="a", age="40-", sex="f"),
  dict(id=30, rd=True, drk=True, hn=True, ch="a", age="40+", sex="f"),
  dict(id=40, rd=True, drk=True, hn=True, ch="a", age="40-", sex="f"),
  dict(id=50, rd=True, drk=True, hn=True, ch="a", age="40+", sex="f"),
  dict(id=60, rd=True, drk=True, hn=True, ch="a", age="40-", sex="f"),
  dict(id=10, rd=True, drk=True, hn=True, ch="b", age="40+", sex="f"),
  dict(id=20, rd=True, drk=True, hn=True, ch="b", age="40-", sex="f"),
  dict(id=30, rd=True, drk=True, hn=True, ch="b", age="40-", sex="f"),
  dict(id=40, rd=True, drk=True, hn=True, ch="b", age="40-", sex="f"),
  dict(id=50, rd=True, drk=True, hn=False, ch="b", age="40-", sex="f"),
  dict(id=60, rd=True, drk=True, hn=True, ch="b", age="40-", sex="f"),
  dict(id=100, rd=False, drk=False, hn=False, ch="a", age="40+", sex="f"),
  dict(id=200, rd=False, drk=False, hn=False, ch="a", age="40-", sex="f"),
  dict(id=300, rd=False, drk=False, hn=False, ch="b", age="40+", sex="m"),
  dict(id=400, rd=False, drk=False, hn=False, ch="b", age="40-", sex="f"),
  dict(id=500, rd=False, drk=False, hn=False, ch="a", age="40+", sex="f"),
  dict(id=600, rd=False, drk=False, hn=False, ch="a", age="40-", sex="f"),
  dict(id=100, rd=False, drk=False, hn=False, ch="b", age="40+", sex="m"),
  dict(id=200, rd=False, drk=False, hn=False, ch="b", age="40-", sex="m"),
  dict(id=300, rd=False, drk=False, hn=False, ch="a", age="40+", sex="f"),
  dict(id=400, rd=False, drk=False, hn=False, ch="a", age="40-", sex="f"),
  dict(id=500, rd=False, drk=False, hn=False, ch="a", age="40+", sex="m"),
  dict(id=600, rd=False, drk=False, hn=False, ch="a", age="40+", sex="m"),
  dict(id=1000, rd=False, drk=False, hn=True, ch="a", age="40+", sex="f"),
  dict(id=2000, rd=False, drk=False, hn=False, ch="a", age="40+", sex="f"),
  dict(id=3000, rd=False, drk=False, hn=False, ch="a", age="40+", sex="f"),
  dict(id=4000, rd=False, drk=False, hn=True, ch="a", age="40+", sex="f"),
  dict(id=5000, rd=False, drk=False, hn=False, ch="a", age="40+", sex="f"),
  dict(id=6000, rd=False, drk=False, hn=False, ch="a", age="40-", sex="f"),
  dict(id=1000, rd=False, drk=False, hn=False, ch="b", age="40+", sex="f"),
  dict(id=2000, rd=False, drk=False, hn=False, ch="b", age="40-", sex="f"),
  dict(id=3000, rd=False, drk=False, hn=False, ch="b", age="40+", sex="f"),
  dict(id=4000, rd=False, drk=False, hn=False, ch="b", age="40-", sex="m"),
  dict(id=5000, rd=False, drk=False, hn=False, ch="b", age="40+", sex="m"),
  dict(id=6000, rd=False, drk=False, hn=False, ch="b", age="40-", sex="f"),
  dict(id=7000, rd=False, drk=False, hn=False, ch="b", age="40-", sex="m"),
  dict(id=8000, rd=False, drk=False, hn=False, ch="b", age="40+", sex="f"),
  dict(id=9000, rd=False, drk=False, hn=False, ch="b", age="40-", sex="m"),
  dict(id=10000, rd=True, drk=False, hn=False, ch="a", age="40+", sex="f"),
  dict(id=11000, rd=True, drk=False, hn=False, ch="b", age="40-", sex="m"),


])

Initially didn't know how to make the pairs so I could perform the test but I found some in Stackoverflow with the same problem. I implemented the advised correction but it still doesn't work. Here's what I have:


ys = ('sex', 'cohort', 'age','t2d', 'htn', 'smk')
def ttest(pair):
    for y in ys:
      results= ttest_ind(df[pair[0]][y], df[pair[1]][y])
    print(f"t-test between {pair[0]} and {pair[1]} is {results[0]} and p-value: {results[1]}")

all_combinations = list(combinations(list(df.keys()), 2)) # find all combinations in the keys of the dict with dataframes
[ttest(i) for i in all_combinations] # pass all combinations through the function ttest

When I run it, the error is:

    raise KeyError(key)
KeyError: 'patid'

I've tried other options, using for loop but I can't seem to make it work.

If someone could tell me how to run the test on all possible combinations, I'd be very grateful! Thank you!

0 Answers0