0
df = pd.DataFrame(np.random.randint(0,100,size=(15, 3)), columns=list('NMO'))
df['Category1'] = ['I','I','I','I','I','G','G','G','G','G','P','P','I','I','P']
df['Category2'] = ['W','W','C','C','C','W','W','W','W','W','O','O','O','O','O']

If I wanted to do a t-test on this data, based on both categories, how would I refer to the categories?

If I was doing the test on one category it would look like:

ttest_ind(
    df[df['Category1']=='P']['N'], 
    df[df['Category1']=='I']['N'])

but what if I wanted to compare data of numbers that have both I and W? I tried this, but it doesn't work.

ttest_ind(
df[[df['Category1']=='G'] and [df['Category2']=='W']]['N'], 
df[[df['Category1']=='I'] and [df['Category2']=='W']]['N'])
Nick ODell
  • 15,465
  • 3
  • 32
  • 66
Baaridi
  • 45
  • 5

1 Answers1

1

Change

df[[df['Category1']=='G'] and [df['Category2']=='W']]['N']

to

df[(df['Category1']=='G') & (df['Category2']=='W')]['N']

and similarly for the 'I'/'W'/'N' line.


and evaluates an expression for "truthiness", while & (with numpy/pandas) is shorthand for np.logical_and which computes the element-wise truth value of two boolean vectors (what you want here). Also, put the expressions in parentheses instead of square brackets. Effectively

# won't work
df[ [boolean series] and [boolean series] ] -> df[ True ]

# vs

# will work
df[ (boolean series) & (boolean series) ] -> df[ 'logical-and'ed boolean series ]
Joshua Voskamp
  • 1,855
  • 1
  • 10
  • 13