1

I know, this is an easy question, but I checked so many sites on the internet and couldn't find the problem that I have.

I have a dataframe and one column of this dataframe is for brand. I wanted to give specific numbers for these brands to make brand aggregation easier.

import pandas as pd

last = pd.read_pickle('pre_clustering.pkl')

random_number=9288
first=""
f=0
for i in last['brand']:
    
    if(type(i)==str):
        if(first == i):
            last.at[f, 'brand']= random_number
            print(last.loc[f, 'brand'])
            f=f+1
            
            
        elif(first !=i):
            first=i
            random_number= random_number +1
            last.at[f, 'brand'] = random_number
            print(last.loc[f, 'brand'])
            f=f+1
           
    else:
        f=f+1
    
brand = last['brand']      

This is my code and output. I tried everthing to convert them to integer, but they are still string. I controlled my if else condition to be sure by using print() and it is working as you see

What is wrong with my code? or what should I do to convert my strings to integers?

jazz kek
  • 94
  • 2
  • 8

2 Answers2

1

In your code, you use a sequence of f as an index of rows in last, but last is sorted on brand, therefore the sequence of f is not the index of row. as a result, you put the random number in the wrong places and leave others.

In order to correct code, we use last.iterrows() in for loop as follows:

for f, row in last.iterrows():
    i=row['brans']

where f will be the index of the row you dealing with, so you do not need f=f+1.

and i holds the brand in the row you deal with.

Finally, I added some declaration as (comment) with modification of your code:

import pandas as pd

last = pd.read_pickle('pre_clustering.pkl')

random_number=9288
first=""
# f=0 (No need)
for f, row in last.iterrows():  # for i in last['brand']:  (Changed: f is the actual row index)
    i=row['brand'] # (added)

    if(type(i)==str):
        if(first == i):
            last.at[f, 'brand']= random_number
            print(last.loc[f, 'brand'])
            # f=f+1   (No need)
            
        elif (first !=i): 
            first=i
            random_number= random_number +1
            last.at[f, 'brand'] = random_number
            print(last.loc[f, 'brand'])
            # f=f+1
           
    #else:
    #    f=f+1
    
brand = last['brand']  

Do your best :)

Nour-Allah Hussein
  • 1,439
  • 1
  • 8
  • 17
0

Did you try typecasting ? with the use of as.type('int') . More details here : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html

George
  • 328
  • 1
  • 9
  • I already tried it and my problem is not with this guessing and had another answer which is correct. If you want, you should check it. Btw thanks for your effort too! – jazz kek Jan 08 '21 at 13:29