I'm handling some data from an Excel file with Python's pandas library and creating two new columns out of this data. I would like to add those columns on the far right of the existing ones in the Excel file itself, without messing with the already present data. For this reason I cannot use pd.to_excel()
, because this overwrites existing data and the entire sheet looses its formatting.
I'm looking into OpenPyXL for the saving of the new data (it's not great having to read the file twice, but handling headers with OpenPyXL and passing them to pandas looks a bit messy to me). The problem is the ws.append()
function inserts data on the bottom of the first column (column A), and I want to do this at the top (row 1) of the first available free column (lets say columns C and D for example). Ideally this should take the data from two pandas Series objects (representing the new columns) or from concatenated columns of the entire pandas DataFrame on which I'm working. And preferably the script should find the first empty cell automatically, although this is not mandatory.
Any ideas? A similar question has been asked here, but there the data is copied from one sheet to another and I can't seem to make it work for me.
Sample code:
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
df = pd.read_excel("file.xlsx")
# New column:
df["Three"] = df["Two"].apply(lambda x: x**2)
# Saving:
file = load_workbook(filename="file.xlsx")
sheet = file["Sheet1"]
for row in dataframe_to_rows(df["Three"], index=False, header=True):
sheet.append(row)
file.save("file.xlsx")
The sample Excel file as a DataFrame:
One Two
0 a 1
1 b 2
2 c 3
As I understand it, dataframe_to_rows()
works with the whole DataFrame and doesn't let me pass in a pandas Series or just one column from a DataFrame. That's why the above code copies the whole DataFrame at the bottom of the Excel file, where I just want the df["Three"]
column inserted into column C of Excel, with the header.