1

I have a dataframe like this.

enter image description here

I want to search in the columns zip and div and get type and apply that result to all unique zips. Kind of pivot function but return the actual value instead of an aggregate function.(Assume there is only one combination of zip and div)

df1 = pd.DataFrame(list(product(list(range(100,200)), ['A','B','C','D','E'])), columns=['zip', 'div'])
df1 = df1.drop(df1.index[np.random.randint(0,499,size=100)]).reset_index()
df1['type'] = np.random.choice(['P','Q','R'],size=df1.shape[0])

I have tried apply/lambda function but it is very slow. My data contains 500K rows in df1 with 41K unique zips and 15 unique divs

Is there an efficient way to get a result like the following.

enter image description here

df2 = pd.DataFrame({'zip':[100,101],'A':['Q','P'],'B':['Q','Q'],'C':['Q','P'],'D':['Q','R'],'E':['Q','P']})

Assume zip is non-numeric.

Vinay
  • 1,149
  • 3
  • 16
  • 28
  • You want to [Pivot without aggregating?](https://stackoverflow.com/questions/54239494/duplicate-entries-and-rename-column-row-in-pandas-pivot-table-without-aggregatio) – m13op22 Apr 26 '19 at 18:48

2 Answers2

2

An alternate solution, if you would like to use a pivot table:

df_pivot=df.pivot_table(index='zip',columns=['div'],aggfunc='first')
Nev1111
  • 1,039
  • 9
  • 13
1

Try with:

m=df.groupby('zip')['type'].apply(list)
n=pd.DataFrame(m.values.tolist(),columns=df['div'].unique(),index=m.index)
print(n)

     A  B  C  D  E
zip               
100  Q  Q  Q  Q  Q
101  P  Q  P  R  P

P.S You shouldn't have div as a column as this is a pandas function(i suggest you change it to something else apart from div)

anky
  • 74,114
  • 11
  • 41
  • 70
  • For some reason I am getting an error on real data. `13 columns passed, passed data had 187 columns` while calculating n – Vinay Apr 26 '19 at 20:30