0

I have a tennis dataframe called atp_main with 50 columns and about 160,000 rows. I have a column called winner_age (and a corresponding loser_age column). At first, these ages showed up as long decimal points e.g. 22.321234342..., 25.23345565665..., etc. The dtypes of this column is 'object'.

I cleaned the data by adjusting these columns to print the ages similar to ints i.e. 22, 25, 31, etc. I did this by converting the column dtype to a float as there appeared to be issues with converting to an int:

atp_data['winner_age'] = pd.to_numeric(atp_data['winner_age'], errors='coerce')

This column's dtype is now a float.I then formatted this column to show nothing after the decimal point i.e. 23 rather than 23.123212432... pd.options.display.float_format = '{:,.0f}'.format

When I call this column and print its values:

atp_main['winner_age]

it prints a list of the ages like so

23
22
21
31
18

However, when I apply a function like mean(), it returns a value with a big, long decimal:

atp_main['winner_age'].mean()
23.4353423354545

Ideally, I would like the winner_age column to be an int and not a float. I tried to convert like so: atp_main['winner_age'].astype(int) But got this error:

ValueError: Cannot convert non-finite values (NA or inf) to integer

The dataframe has quite a number of NaN values in the first portion of the dataset, just to be aware of.

Many thanks in advance!

Mazz
  • 770
  • 3
  • 11
  • 23
  • You cannot actually make it an integer, but you can certainly clean up the display with `np.round` `df['col_name'] = np.round(df.col_name, 0)` – ALollz Apr 12 '18 at 19:51
  • To simplify the question for my own sake - I want to round the answer I get from applying .mean() to a column of floats i.e. atp_main['winner_age'].mean(). I want an answer of 25, not 25.7353753737 – Mazz Apr 12 '18 at 19:56

0 Answers0