How to write an efficient multiple criteria search function in pandas?

Question

I have a dataframe like this.

I want to search in the columns zip and div and get type and apply that result to all unique zips. Kind of pivot function but return the actual value instead of an aggregate function.(Assume there is only one combination of zip and div)

df1 = pd.DataFrame(list(product(list(range(100,200)), ['A','B','C','D','E'])), columns=['zip', 'div'])
df1 = df1.drop(df1.index[np.random.randint(0,499,size=100)]).reset_index()
df1['type'] = np.random.choice(['P','Q','R'],size=df1.shape[0])

I have tried apply/lambda function but it is very slow. My data contains 500K rows in df1 with 41K unique zips and 15 unique divs

Is there an efficient way to get a result like the following.

df2 = pd.DataFrame({'zip':[100,101],'A':['Q','P'],'B':['Q','Q'],'C':['Q','P'],'D':['Q','R'],'E':['Q','P']})

Assume zip is non-numeric.

You want to [Pivot without aggregating?](https://stackoverflow.com/questions/54239494/duplicate-entries-and-rename-column-row-in-pandas-pivot-table-without-aggregatio) — m13op22, Apr 26 '19 at 18:48

score 2 · Accepted Answer · answered Apr 26 '19 at 19:24

2

An alternate solution, if you would like to use a pivot table:

df_pivot=df.pivot_table(index='zip',columns=['div'],aggfunc='first')

answered Apr 26 '19 at 19:24

Nev1111

1,039
9
13

anky · Answer 2 · 2019-04-26T18:59:02.797

1

Try with:

m=df.groupby('zip')['type'].apply(list)
n=pd.DataFrame(m.values.tolist(),columns=df['div'].unique(),index=m.index)
print(n)

     A  B  C  D  E
zip               
100  Q  Q  Q  Q  Q
101  P  Q  P  R  P

P.S You shouldn't have div as a column as this is a pandas function(i suggest you change it to something else apart from div)

edited Apr 26 '19 at 18:59

answered Apr 26 '19 at 18:54

anky

74,114
11
41
70

For some reason I am getting an error on real data. `13 columns passed, passed data had 187 columns` while calculating n – Vinay Apr 26 '19 at 20:30

How to write an efficient multiple criteria search function in pandas?

2 Answers2