Combine multiple csv files into a single xls workbook Python 3

Question

We are in the transition at work from python 2.7 to python 3.5. It's a company wide change and most of our current scripts were written in 2.7 and no additional libraries. I've taken advantage of the Anaconda distro we are using and have already change most of our scripts over using the 2to3 module or completely rewriting them. I am stuck on one piece of code though, which I did not write and the original author is not here. He also did not supply comments so I can only guess at the whole of the script. 95% of the script works correctly until the end where after it creates 7 csv files with different parsed information it has a custom function to combine the csv files into and xls workbook with each csv as new tab.

import csv
import xlwt
import glob
import openpyxl
from openpyxl import Workbook

Parsefiles = glob.glob(directory + '/' + "Parsed*.csv")
def xlsmaker():
    for f in Parsefiles:
        (path, name) = os.path.split(f)
        (chort_name, extension) = os.path.splittext(name)
        ws = wb.add_sheet(short_name)
        xreader = csv.reader(open(f, 'rb'))
        newdata = [line for line in xreader]
        for rowx, row in enumerate(newdata)
            for colx, value in enumerate(row):
                if value.isdigit():
            ws.write(rowx, colx, value)

xlsmaker()

for f in Parsefiles:
    os.remove(f)

wb.save(directory + '/' + "Finished" + '' + oshort + '' + timestr + ".xls")

This was written all in python 2.7 and still works correctly if I run it in python 2.7. The issue is that it throws an error when running in python 3.5.

File "parsetool.py", line 521, in (module)
  xlsmaker()
File "parsetool.py", line 511, in xlsmaker
  ws = wb.add_sheet(short_name)
File "c:\pythonscripts\workbook.py", line 168 in add_sheet
  raise TypeError("The paramete you have given is not of the type '%s'"% self._worksheet_class.__name__)
TypeError: The parameter you have given is not of the type "Worksheet"

Any ideas about what should be done to fix the above error? Iv'e tried multiple rewrites, but I get similar errors or new errors. I'm considering just figuring our a whole new method to create the xls, possibly pandas instead.

score 29 · Accepted Answer · edited Jan 11 '20 at 18:39

29

Not sure why it errs. It is worth the effort to rewrite the code and use pandas instead. Pandas can read each csv file into a separate dataframe and save all dataframes as a separate sheet in an xls(x) file. This can be done by using the ExcelWriter of pandas. E.g.

import pandas as pd
writer = pd.ExcelWriter('yourfile.xlsx', engine='xlsxwriter')
df = pd.read_csv('originalfile.csv')
df.to_excel(writer, sheet_name='sheetname')
writer.save()

Since you have multiple csv files, you would probably want to read all csv files and store them as a df in a dict. Then write each df to Excel with a new sheet name.

Multi-csv Example:

import pandas as pd
import sys
import os

writer = pd.ExcelWriter('default.xlsx') # Arbitrary output name
for csvfilename in sys.argv[1:]:
    df = pd.read_csv(csvfilename)
    df.to_excel(writer,sheet_name=os.path.splitext(csvfilename)[0])
writer.save()

(Note that it may be necessary to pip install openpyxl to resolve errors with xlsxwriter import missing.)

edited Jan 11 '20 at 18:39

Stabledog

3,110
2
32
43

answered Feb 07 '17 at 14:59

user5550905

336
3
5

Thanks for the reply. What really solved the problem was fixing some of the encoding of the csv files. On top of that some of the files had to have specific encoding different than the workbook to get the script to function. Totally weird, but it works and now I have four functioning scripts all under a PyQt GUI. I'll probably add the pandas in to ensure that it'll work in the future. – Tobias Wright Feb 08 '17 at 14:23
Hey @TobiasWright Is there a way you can post the code in here. I'm trying to follow. Thank you. – JPC Jun 02 '17 at 17:33
`NameError: name 'xlsxwriter' is not defined`. Also, it seems `pandas` is assuming that `openpyxl` is available. Can you make this answer more complete? (I'm starting from miniconda, if that answers why I don't have everything installed.) – r2evans Feb 14 '18 at 23:01
`NameError: name 'xlsxwriter' is not defined` is not working because it has to be sent by name. You are sending a module there. the correct line is: `writer = pd.ExcelWriter('yourfile.xlsx', engine='xlsxwriter')` send 'xlsxwriter' as a string – hecvd Jul 18 '18 at 17:29
@r2evans: when I did this, I had to run `pip install openpyxl` to make it work. Seems that pandas doesn't do that part for you. – Stabledog Jan 11 '20 at 18:31
If you get a Pandas warning about the call to `writer.save()`, the answer is to use writer as a context manager instead. [Full example](https://stackoverflow.com/a/74277834/456550) – Christian Long Nov 01 '22 at 14:58

score 3 · Answer 2 · answered Nov 01 '22 at 14:57

Here's a slight extension to the accepted answer. Pandas 1.5 complains about the call to writer.save(). The fix is to use the writer as a context manager.

import sys
from pathlib import Path
import pandas as pd

with pd.ExcelWriter("default.xlsx") as writer:
    for csvfilename in sys.argv[1:]:
        p = Path(csvfilename)
        sheet_name = p.stem[:31]
        df = pd.read_csv(p)
        df.to_excel(writer, sheet_name=sheet_name)

This version also trims the sheet name down to fit in Excel's maximum sheet name length, which is 31 characters.

score 1 · Answer 3 · answered Apr 27 '21 at 18:35

You can use the code below, to read multiple .csv files into one big .xlsx Excel file. I also added the code for replacing ',' by '.' (or vice versa) for improved compatibility on windows environments and according to your locale settings.

import pandas as pd
import sys
import os
import glob
from pathlib import Path

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

writer = pd.ExcelWriter('fc15.xlsx') # Arbitrary output name
for csvfilename in all_filenames:

    txt = Path(csvfilename).read_text()
    txt = txt.replace(',', '.')

    text_file = open(csvfilename, "w")
    text_file.write(txt)
    text_file.close()
    
    print("Loading "+ csvfilename)
    df= pd.read_csv(csvfilename,sep=';', encoding='utf-8')

    df.to_excel(writer,sheet_name=os.path.splitext(csvfilename)[0])
    print("done")
writer.save()
print("task completed")

score 0 · Answer 4 · answered Nov 03 '22 at 17:02

If your csv file is in Chinese with gbk encoding, you can use the following code

import pandas as pd
import glob
import datetime
from pathlib import Path

now = datetime.datetime.now()

extension = "csv"
all_filenames = [i for i in glob.glob(f"*.{extension}")]


with pd.ExcelWriter(f"{now:%Y%m%d}.xlsx") as writer:
    for csvfilename in all_filenames:
        print("Loading " + csvfilename)
        df = pd.read_csv(csvfilename, encoding="gb18030")
        df.to_excel(writer, index=False, sheet_name=Path(csvfilename).stem)
        print("done")
print("task completed")

Combine multiple csv files into a single xls workbook Python 3

4 Answers4

Linked