0

Hey everybody thanks in advance! I have been troubleshooting some code and can't seem to figure out whats wrong. I have a folder on my computer with a bunch of pdf files, each file is named with the "D1234-12345" format. I also have an excel file with the first column (column A) being the string before the "-", in this example it would be "D1234". The third column (column C) contains an email address associated with the column A account. My goal is to loop through the folder, reference the excel sheet and find the email associated with that invoce, then create an email for the correct pdf and send the pdf over. I've attached what I have come up with, I'm writing it in python and using pandas to parse the excell file. The program seems to be angry with the column and row declarations. Disclaimer!! I am more of a networking guy and don't have much experince with pandas, that being said most of this is code I found on stack exchange and modified. Any insight would be very welcome, thanks for the read!

import os
import pandas as pd
import win32com.client

#change this path as needed... this sets the file that will be parsed
file = 'C:\invoices\invoices.xlsx'
# Check that the "Dealer_Number" column is present 
df = pd.read_excel(file, usecols="A,C", names=['Dealer_Number', 'Email_Address'])
# Check if Excel file is empty
if df.empty:
    df = pd.read_excel(file) # re-read the Excel file with all columns
    if df.empty:
        print("Error: Excel file is still empty. Please check your Excel File.")
        exit()

#Change this path too... this sets the folder where the PDFs we are trying to mail are
folder = 'C:\invoicesPDF'

#loop for finding the dealer number & email addresses
for file in os.listdir(folder):
#join them
    pdf_file = os.path.join(folder, file)
    # extract the characters before the dash
    text_before_dash = file.split('-')[0]
    # use pandas to access the row values by the extracted text in the excel file
    row = df.loc[df['Dealer_Number'] == text_before_dash] 
    #error handling for if row is empty:
    if row.empty:
        print("Error: Dealer_Number {} dosnt exist".format(text_before_dash))
        exit() 
    #now we find the email address by setting the row to the previous value found
    #and then using the 'C' we defined earlier to jump over then pull the value with [0]
    to_email = row['Email_Address'].values[0]
    #boot up outlook
    outlook = win32com.client.Dispatch("Outlook.Application")
    mail = outlook.CreateItem(0)
    mail.To = to_email
    mail.Subject = 'Yeti Invoice'
    #make the body happy...
    mail.body = 'Hello, please find the attached invoice for your records. \n\n Thank you.'   
    attachment = mail.Attachments.Add(pdf_file)
    #error handling for if mail fails.
    try:
        mail.Send()
    except Exception as e:
        print("Failed to send email: {}".format(e))

The new error with the error checks is:

Error: Dealer_Number .DS_Store dosnt exist

The error that the code without error checks is:

    to_email = row['Email_Address'].values[0]
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: index 0 is out of bounds for axis 0 with size 0

I think the rows/columns are not being found properly or the folder the pdfs are located in is not being accessed. Guesswork here, any insight is welcome thanks again!

1 Answers1

0

Your problem is that you are selecting columns A through C with usecols="A:C" while at the same time only specifying two column names with names=['Dealer_Number', 'Email_Address']. You should either provide a third column name or change your usecols string to "A,C". The difference is that the colon defines a range of columns, where the comma separates them. Want to know more? here are the docs.

Hein Gertenbach
  • 318
  • 2
  • 13
  • Thank you for linking the documentation that was helpful to gain some insight on how everything works! The "," did the trick and I am now able to parse the file. I am running into a new issue, I updated the original question with the error. Any thoughts on what is going wrong? I'll add the error that is thrown by the original code as it is different than the error I get with the checks. Thank you again! – Luke Shacter Jun 16 '23 at 18:51
  • Also, look at this to maybe help with your question> https://stackoverflow.com/questions/28382735/python-pandas-does-not-read-the-first-row-of-csv-file – Hein Gertenbach Jun 16 '23 at 19:34