I have a tennis dataframe called atp_main with 50 columns and about 160,000 rows. I have a column called winner_age (and a corresponding loser_age column). At first, these ages showed up as long decimal points e.g. 22.321234342..., 25.23345565665..., etc. The dtypes of this column is 'object'.
I cleaned the data by adjusting these columns to print the ages similar to ints i.e. 22, 25, 31, etc. I did this by converting the column dtype to a float as there appeared to be issues with converting to an int:
atp_data['winner_age'] = pd.to_numeric(atp_data['winner_age'], errors='coerce')
This column's dtype is now a float.I then formatted this column to show nothing after the decimal point i.e. 23 rather than 23.123212432... pd.options.display.float_format = '{:,.0f}'.format
When I call this column and print its values:
atp_main['winner_age]
it prints a list of the ages like so
23
22
21
31
18
However, when I apply a function like mean(), it returns a value with a big, long decimal:
atp_main['winner_age'].mean()
23.4353423354545
Ideally, I would like the winner_age column to be an int and not a float. I tried to convert like so: atp_main['winner_age'].astype(int) But got this error:
ValueError: Cannot convert non-finite values (NA or inf) to integer
The dataframe has quite a number of NaN values in the first portion of the dataset, just to be aware of.
Many thanks in advance!