0

I have a dataframe in pandas, with a column named "Score". I have to do a rank,nevertheless in the column I have numbers like: 100 200 300 but also numbers like: 500,00 800,00

I need to clean this column to rank.

when I try to convert the string to int I got this error:

invalid literal for int() with base 10: '610,00'

and when try to convert first to Float then integer i got: could not convert string to float: '610,00'

I need to clean the column to the number dont be 500,00 800,00 etc,

but 500 800 610...

  • 1
    If the `,` is always a decimal separator (and not a thousands separator), perhaps you could first replace `,` with `.` – Swifty Mar 15 '23 at 17:05

1 Answers1

0

@Jheison, expanding on what Swifty commented, if the comma is a decimal separator, as in some locales (Latin America, for example), you could do the following.

  • You could directly modify the DataFrame

    df["score"] = pd.to_numeric(df["score"].str.replace(",", "."), downcast="integer")
    

    Since you need to rank these, I suggest using the Pandas rank function which you can choose which method to rank.

    How to rank the group of records that have the same value (i.e. ties):

    • average: average rank of the group
    • min: lowest rank in the group
    • max: highest rank in the group
    • first: ranks assigned in order they appear in the array
    • dense: like ‘min’, but rank always increases by 1 between groups.

    Another key argument is if you're ranking these numbers in ascending order (1, 2, 3, ...) or not.

Lastly, I would have added the Pandas tag to your question.