Edit: All sorted, Imanol Luengo set me straight. Here's the end result, in all it's glory.
I don't understand the counts of my variables, maybe someone can explain? I'm filtering two columns of pass/fails for two locations. I want a count of all 4 pass/fails.
Here's the header of the columns. There are 126 values in total:
WT Result School
0 p Milan
1 p Roma
2 p Milan
3 p Milan
4 p Roma
Code so far:
data2 = pd.DataFrame(data[['WT Result', 'School']])
data2.dropna(inplace=True)
# Milan Counts
m_p = (data2['School']=='Milan') & (data2['WT Result']=='p')
milan_p = (m_p==True)
milan_pass = np.count_nonzero(milan_p) # Count of Trues for Milano
# Rome Counts
r_p = (data2['School']=='Roma') & (data2['WT Result']=='p')
rome_p = (r_p==True)
rome_pass = np.count_nonzero(rome_p) # Count of Trues for Rome
So what I've done, after stripping the excess columns (data2), is:
- filter by location and == 'p' (vars m_p and r_p)
- filter then by ==True (vars milan_p and rome_p)
- Do a count_nonzero() for a count of 'True' (vars milan_pass and rome_pass)
Here's what I don't understand - these are the lengths of the variables:
data2: 126
m_p: 126
r_p: 126
milan_p: 126
rome_p: 126
milan_pass: 55
rome_pass: 47
Why do the lengths remain 126 once the filtering starts? To me, this shows that neither the filtering by location or by 'p' worked. But when I do the final count_nonzero() the results are suddenly separated into location. What is happening?