I had a dataframe which has 294467 rows and 7 columns. I want to assign the same number to products' brands which has the same brand.
Here is the example of my dataframe:
overall ... brand
0 5.0 ... Pirmal Healthcare
1 5.0 ... Pirmal Healthcare
2 5.0 ... Pirmal Healthcare
3 5.0 ... Pirmal Healthcare
4 4.0 ... Pirmal Healthcare
... ... ...
294975 4.0 ... Gentlemen's Hardware
294976 5.0 ... Benefit Cosmetics
294977 1.0 ... Salon Perfect
294978 1.0 ... GBSTORE
294979 1.0 ... GBSTORE
[294467 rows x 7 columns]
Final result should be:
overall ... brand
0 5.0 ... 1
1 5.0 ... 1
2 5.0 ... 1
3 5.0 ... 1
4 4.0 ... 1
... ... ...
294975 4.0 ... 7839
294976 5.0 ... 7840
294977 1.0 ... 7841
294978 1.0 ... 7842
294979 1.0 ... 7842
[294467 rows x 7 columns]
For this result, I sorted my dataframe according to brand. Then assigned different numbers to them with this code:
sorted_copy = copy.sort_values('brand')
random_number=0
first=""
for f, row in sorted_copy.iterrows():
i=row['brand']
if(first == i):
sorted_copy.at[f, 'brand'] = random_number
elif(first !=i):
first=i
random_number= random_number +1
sorted_copy.at[f, 'brand'] = random_number
However, this process took maybe an hour and half. Is there any solution to get this result in a short time? Can anyone help?
Thank you.