I have a problem that I am 99% sure has a numpy broadcasting solution, but I'm unable to figure it out. Suppose I have the following dataframe:
iterables = [['US', 'DE'], ['A', 'B'], [1, 2, 3, 4, 5]]
idx3 = pd.MultiIndex.from_product(iterables, names=['v1', 'v2', 'v3'])
df3 = pd.DataFrame(data=np.random.randn(20,2), index=idx3)
print(d3)
c1 c2
v1 v2 v3
US A 1 -0.023208 -1.047208
2 1.128917 0.292252
3 -0.441574 0.038714
4 1.057893 1.313874
5 0.938736 -0.130192
B 1 -0.479439 -0.311465
2 -1.730325 -1.300829
3 -0.112920 -0.269385
4 1.436866 0.197434
5 1.659529 2.107746
DE A 1 0.533169 -0.539891
2 0.225635 1.406626
3 -0.928966 0.979749
4 -0.109132 0.862450
5 -0.481120 1.425678
B 1 0.592646 -0.573862
2 -1.135009 -0.365472
3 0.728357 0.744631
4 0.156970 0.623244
5 -0.071628 -0.089194
Now suppose I want column c3, such that c3 is equal to column c1 for values 1-3 of index-level v3, and is equal to column c2 for values 3-5 of index-level v3.
Using apply this ought to be easy.
df3.reset_index('v3').apply(lambda df: df.c1 if df.v3<=3 else df.c2, axis=1)
But this is looping through each row and checking a condition. Using boolean indexing I can get here:
bool1 = df3.loc[df3.index.get_level_values('v3')<=3,['c1']]
bool2 = df3.loc[df3.index.get_level_values('v3')>3,['c2']]
print bool1
c1
v1 v2 v3
US A 1 -0.023208
2 1.128917
3 -0.441574
B 1 -0.479439
2 -1.730325
3 -0.112920
DE A 1 0.533169
2 0.225635
3 -0.928966
B 1 0.592646
2 -1.135009
3 0.728357
print bool2
c2
v1 v2 v3
US A 4 1.313874
5 -0.130192
B 4 0.197434
5 2.107746
DE A 4 0.862450
5 1.425678
B 4 0.623244
5 -0.089194
But can't figure out how to get this back in my original dataframe. I feel like I'm basically there, but keep running down dead ends.