2

The following is my df: https://www.dropbox.com/s/nbez3esbo8fedmf/aapl.csv?dl=0

date        ticker  open    high    low     close   adjClose
2019-07-08  AAPL    50.2025 50.35   49.6025 50.005  48.516
2019-07-09  AAPL    49.8    50.3775 49.7025 50.31   48.8119
2019-07-10  AAPL    50.4625 50.9325 50.39   50.8075 49.2946
2019-07-11  AAPL    50.8275 51.0975 50.4275 50.4375 48.9356
2019-07-12  AAPL    50.6125 51.0    50.55   50.825  49.3116
2019-07-15  AAPL    51.0225 51.4675 51.0    51.3025 49.7748
2019-07-16  AAPL    51.1475 51.5275 50.875  51.125  49.6026
2019-07-17  AAPL    51.0125 51.2725 50.8175 50.8375 49.3237
2019-07-18  AAPL    51.0    51.47   50.925  51.415  49.884

I'd like to round the close column to 2 decimal places. I tried the following:

df['close'] = round(df['close'], 2)
df.loc[:, 'close'] = df.loc[:, 'close'].round(2)
df.loc[:, 'close'] = df.loc[:, 'close'].apply(lambda x: Decimal(x).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP))
df.loc[:, 'close'] = df.loc[:, 'close'].apply(lambda x: Decimal(x).quantize(Decimal('0.01')))
df.loc[:, 'close'] = np.round(df.loc[:, 'close'], 2)

But the best I can do is this:

date        ticker  open    high    low     close   adjClose
2019-07-08  AAPL    50.2025 50.35   49.6025 50.01   48.516
2019-07-09  AAPL    49.8    50.3775 49.7025 50.31   48.8119
2019-07-10  AAPL    50.4625 50.9325 50.39   50.81   49.2946
2019-07-11  AAPL    50.8275 51.0975 50.4275 50.44   48.9356
2019-07-12  AAPL    50.6125 51.0    50.55   50.83   49.3116
2019-07-15  AAPL    51.0225 51.4675 51.0    51.30   49.7748
2019-07-16  AAPL    51.1475 51.5275 50.875  51.13   49.6026
2019-07-17  AAPL    51.0125 51.2725 50.8175 50.84   49.3237
2019-07-18  AAPL    51.0    51.47   50.925  51.41   49.884

The date 2019-07-18 should be 51.42, but I got 51.41. And depending on which of the five ways I used, some can't even round 2019-07-08 50.005 & 2019-07-12 50.825 appropriately because I got 50 and 50.82 instead of 50.01 and 50.83.

So how can I round it properly?

saga
  • 736
  • 2
  • 8
  • 20
  • 1
    try the following format: df['close'] = round(df['close'], 2) – Arwa Jul 09 '23 at 21:59
  • Python uses "banker's rounding". 5 rounds up if the next higher digit is even, and down if the next higher digit is odd. So .505 becomes .51, and .515 also becomes .51. To avoid this, you can add .00001 to your column first. – Tim Roberts Jul 09 '23 at 22:41

2 Answers2

1

You can read the CSV file as string (don't convert to floats) and then use Decimal to convert the numbers appropriately:

import pandas as pd
from decimal import Decimal, localcontext, ROUND_HALF_UP

df = pd.read_csv('aapl.csv', dtype=str)  # <-- read the values from csv as string

cols = ['open', 'high', 'low', 'close', 'adjClose']

with localcontext() as ctx:
    ctx.rounding = ROUND_HALF_UP
    df[cols] = df[cols].apply(lambda row: [f'{Decimal(v):.2f}' for v in row])

interesting_dates = ['2019-07-18', '2019-07-08', '2019-07-12']
print(df[df['date'].isin(interesting_dates)])

Prints:

         date ticker   open   high    low  close adjClose
0  2019-07-08   AAPL  50.20  50.35  49.60  50.01    48.52
4  2019-07-12   AAPL  50.61  51.00  50.55  50.83    49.31
8  2019-07-18   AAPL  51.00  51.47  50.93  51.42    49.88
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

The round() function in Python rounds to the nearest even number.

This allows to round both up and down algorithmically in a series of numbers, which avoids skewing the data. Most rounding functions used in Python use similar mechanisms in order to not skew your data upwards.

AxelOruse
  • 46
  • 3