Python script getting different subtotals on the same csv with the same code as a co-worker

Question

Hi all I am working in python and am running a script on a csv to get totals and subtotals and for some reason I am getting a slightly different subtotal in one of the columns then my co-worker that is running the exact same python script on the same csv file. I am at a loss to what is happening. Any help would be greatly appreciated. I thought the issue might have been that I was using Pycharm IDE so I removed it and only am using Jupyter Notbook like my co-worker is using. I made sure that we both had the most up to date python, pandas, and numpy libraries. If anyone has experienced this issue I would be happy to hear how you resolved it. I have included a simplified code below to give the gist of the script being used.

import pandas as pd
import numpy as np

filename = "/Users/X/Downloads/2022.csv"
df = pd.read_csv(filename, index_col=None)
print("df is loaded")
lookup_key = {
    "1": "A",
    "2": "B",
    "3": "C",
    "4": "D",
    "5": "E",
    "6": "F",
    "7": "G",
    "8": "H",
}
df["from"] = df["from"].map(lookup_key)
df["to"] = df["to"].map(lookup_key)
df.loc[~df["to"].isin(["A", "B", "C", "D", "E", "F", "G", "H"]), 
"to"] = "user"

df.loc[~df_coin["from"].isin(["A", "B", "C", "D", "E", "F", "G", 
"H"]), "from"] = "user"
table = df[["to", "from", "value"]]
print(table)
final = pd.pivot_table(
    table,
    index=["to"],
    columns=["from"],
    aggfunc=np.sum,
    margins=True,
    margins_name="Total",
).fillna("")
print(final)
final.to_csv("/Users/X/Downloads/2022final.csv", index=True)

can you post an example CSV file for which this happens? What are the different outputs you get? — Cam, Apr 20 '22 at 16:11
The total I'm getting is 1000127975044832128 vs 1000127975044844160 — Craig, Apr 20 '22 at 17:12
You are doing a lot of conversions if your dataframe and then a pivot. One idea to track where possible extra is getting introduced would be to `df.describe()` and `df.info()` steps after each and every step and then run them on the two systems. For the pivoting you may want to make a deeo copy of the dataframe and pivot and check the pivoted version separate from the agg function and fill.na(). Same kind machines involved? Not Mac vs. Windows? Are you sure the versions the notebook use are the same? In other words you a checking the packages in the notebooks and not just in a terminal? — Wayne, Apr 20 '22 at 17:23

Python script getting different subtotals on the same csv with the same code as a co-worker

0 Answers0