0

Currently I am trying to merge multiple excel files into one Using python. What I have so far is as follows:

sharedDocs = "C:\\SPSharedDocuments\\*.xlsx"
invoices = "C:\\SPInvoices\\*.xlsx"
formsCerts = "C:\\SPForms&Certificates\\*.xlsx"
mgmt = "C:\\SPManagement\\*.xlsx"

files = [sharedDocs, invoices, formsCerts, mgmt]

for docs in files:
    excel = []
    for file in glob.glob(docs):
        excel.append(file)

    excels = [pd.ExcelFile(name) for name in excel]
    frames = [x.parse(x.sheet_names[0], header=None, index_col=None) for x in excels]
    frames_new = [df[1:] for df in frames[1:]]
    combined = pd.concat(frames_new)

    if sharedDocs == docs:
        combined.to_excel("SharedDocsMerged.xlsx", header = False, index = False)
    elif invoices == docs:
        combined.to_excel("InvoicesMerged.xlsx", header = False, index = False)
    elif formsCerts == docs:
        combined.to_excel("FormsCertsMerged.xlsx", header = False, index = False)
    else:
        combined.to_excel("MGMTMerged.xlsx", header = False, index = False)

This works but it does not copy the first header so that I know what the name for each column is. Before I had the line that read frames_new = [df[1:] for df in frames[1:]] as frames[1:] = [df[1:] for df in frames[1:]] but this was causing multiple copies of the same file.

All I need is it to copy one header so I know the value of each column.

Your help is much appreciated and thank you in advance.

UPDATE:

I tried using the post that put below suggesting it was a similar question and I edited my code to look like this:

sharedDocs = "C:\\SPSharedDocuments\\*.xlsx"
invoices = "C:\\SPInvoices\\*.xlsx"
formsCerts = "C:\\SPForms&Certificates\\*.xlsx"
mgmt = "C:\\SPManagement\\*.xlsx"

files = [sharedDocs, invoices, formsCerts, mgmt]

for docs in files:
    excel = []
    for file in glob.glob(docs):
        excel.append(pd.read_excel(file))
    df = pd.concat(excel, ignore_index=True, sort = True)

    if sharedDocs == docs:
        df.to_excel("SharedDocsMerged.xlsx", header = False, index = False)
    elif invoices == docs:
        df.to_excel("InvoicesMerged.xlsx", header = False, index = False)
    elif formsCerts == docs:
        df.to_excel("FormsCertsMerged.xlsx", header = False, index = False)
    else:
        df.to_excel("MGMTMerged.xlsx", header = False, index = False)

the result that I get is 2 extra columns on the left, a missing column and still no header.

RChicas23
  • 3
  • 3
  • 2
    Possible duplicate of [Append Multiple Excel Files(xlsx) together in python](https://stackoverflow.com/questions/46930575/append-multiple-excel-filesxlsx-together-in-python) – Shiva2794 Jan 07 '19 at 17:10
  • @Shiva2794 thanks for that link, i tried editing my code to do what that code does but it gives me two extra columns on the left and still no header – RChicas23 Jan 07 '19 at 18:55
  • By *merge*, do you really mean *append* to existing worksheets? Why are you setting `header=False`? – Parfait Jan 07 '19 at 19:14
  • @Parfait can you explain a bit more? if i change the header parameter in frames that means it takes the header from all the files. Or are you talking about adding it to combined? so it should be `combined = pd.concat(frames_new, header = True)` ? I'm going through 4 folders and the files in each folder have separate headers. Wouldn't what you suggesting give it the same header for everything? Header is false because it is just numbers 0 to N not the actual header with the names I put int the separate files – RChicas23 Jan 07 '19 at 19:18
  • Can you post current Excel data with those four data sources (few rows + headers) and desired Excel result? It is hard to understand your process without seeing any data. – Parfait Jan 07 '19 at 19:22
  • Sorry about this guys, turns out I had another merged files with those separate files so the separate files and the merged file were merging all together. After deleting that file and changing `frames_new = [df[1:] for df in frames[1:]]` to `frames[1:] = [df[1:] for df in frames[1:]]` everything lines up with the correct header (i used the original code i posted). I appreciate everyone's input. As I was looking for the files to show @Parfait was when i noticed my mistake. Thank you again. – RChicas23 Jan 07 '19 at 20:20

0 Answers0