1

I had a dataframe which has 294467 rows and 7 columns. I want to assign the same number to products' brands which has the same brand.

Here is the example of my dataframe:

        overall  ...                 brand
0           5.0  ...     Pirmal Healthcare
1           5.0  ...     Pirmal Healthcare
2           5.0  ...     Pirmal Healthcare
3           5.0  ...     Pirmal Healthcare
4           4.0  ...     Pirmal Healthcare
 ...        ...                   ...
294975      4.0  ...  Gentlemen's Hardware
294976      5.0  ...     Benefit Cosmetics
294977      1.0  ...         Salon Perfect
294978      1.0  ...               GBSTORE
294979      1.0  ...               GBSTORE

[294467 rows x 7 columns]

Final result should be:

        overall  ...    brand
0           5.0  ...     1
1           5.0  ...     1
2           5.0  ...     1
3           5.0  ...     1
4           4.0  ...     1
  ...       ...         ...
294975      4.0  ...    7839
294976      5.0  ...    7840
294977      1.0  ...    7841
294978      1.0  ...    7842
294979      1.0  ...    7842

[294467 rows x 7 columns]

For this result, I sorted my dataframe according to brand. Then assigned different numbers to them with this code:

sorted_copy = copy.sort_values('brand')

random_number=0
first=""
for f, row in sorted_copy.iterrows():  
    i=row['brand'] 
    
    if(first == i):
        sorted_copy.at[f, 'brand'] = random_number
        
        
    elif(first !=i):
        first=i
        random_number= random_number +1
        sorted_copy.at[f, 'brand'] = random_number

However, this process took maybe an hour and half. Is there any solution to get this result in a short time? Can anyone help?

Thank you.

jazz kek
  • 94
  • 2
  • 8

1 Answers1

0

df['brand'] = df['brand'].astype("category").cat.codes should work fine.