6

I wrote a python script that will pull excel files from a folder and write them into a SQL table. I got the code to work, but only if I delete the first line of the excel file which contains the headers. I'm new to Python so this is probably something simple, but I looked at a lot of different techniques and couldn't figure out how to insert it into my code. Any ideas would be greatly appreciated!

# Import arcpy module
from xlrd import open_workbook ,cellname
import arcpy
import pyodbc as p

# Database Connection Info
server = "myServer"
database = "my_Tables"
connStr = ('DRIVER={SQL Server Native Client 10.0};SERVER=' + server + ';DATABASE=' + database + ';' + 'Trusted_Connection=yes')

# Assign path to Excel file
file_to_import = '\\\\Location\\Report_Test.xls'

# Assign column count
column_count=10

# Open entire workbook
book = open_workbook(file_to_import)

# Use first sheet
sheet = book.sheet_by_index(0)

# Open connection to SQL Server Table
conn = p.connect(connStr)

# Get cursor
cursor = conn.cursor()

# Assign the query string without values once, outside the loop
query = "INSERT INTO HED_EMPLOYEE_DATA (Company, Contact, Email, Name, Address, City, CentralCities, EnterpriseZones, NEZ, CDBG) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"

# Iterate through each row

for row_index in range(sheet.nrows):

    row_num          = row_index
    Company          = sheet.cell(row_index,0).value
    Contact          = sheet.cell(row_index,1).value
    Email            = sheet.cell(row_index,2).value
    Name             = sheet.cell(row_index,3).value
    Address          = sheet.cell(row_index,4).value
    City             = sheet.cell(row_index,5).value
    CentralCities    = sheet.cell(row_index,6).value
    EnterpriseZones  = sheet.cell(row_index,7).value
    NEZ              = sheet.cell(row_index,8).value
    CDBG             = sheet.cell(row_index,9).value

    values = (Company, Contact, Email, Name, Address, City, CentralCities, EnterpriseZones, NEZ, CDBG)

    cursor.execute(query, values)

# Close cursor
cursor.close()

# Commit transaction
conn.commit()

# Close SQL server connection
conn.close()
LauraJ
  • 63
  • 1
  • 2
  • 5
  • 2
    Sounds like you just need to change `for row_index in range(sheet.nrows):` to `for row_index in range(1, sheet.nrows):` to skip the first row. – Aya May 31 '13 at 14:51
  • There is nothing in Excel which says "this is a header row" so you either have to know how many lines to skip (Aya's comment) or guess. – msw May 31 '13 at 14:52

2 Answers2

10

You can initialize the iteration at the second row. Try the following:

for row_index in range(1,sheet.nrows):

Edit: If you need to iterate over a list of .xls files, as you asked in the comments, the basic idea is to perform an external loop over the files. Here it comes some hints:

# You need to import the os library. At the beinning of your code
import os

...
# Part of your code here
...

# Assign path to Excel file
#file_to_import = '\\\\Location\\Report_Test.xls'
folder_to_import = '\\\\Location'
l_files_to_import = os.listdir(folder_to_import)
for file_to_import in l_files_to_import:
    if file_to_import.endswith('.xls'):
        # The rest of your code here. Be careful with the indentation!
        column_count=10
        ...
Miquel
  • 858
  • 11
  • 20
  • Aya, it worked brilliantly! Thank you! I figured it was something really obvious and I was just over-thinking it. – LauraJ May 31 '13 at 15:02
  • If I wanted to take it further and select all the excel files located in a folder.... would it be something like '*.xls' ? I tried this but it doesn't seem to like my syntax.. – LauraJ May 31 '13 at 15:11
  • You need an external loop to iterate over the files. You can execute a command to list all files and then iterate over them. This would be an external loop to the one you already have – Miquel May 31 '13 at 15:23
  • Where in the script would I put the external loop? I was thinking of changing the file_to_import to only point to the folder containing all the XLS files then start the external loop right after that. I'm sorry, i'm very new to python. – LauraJ May 31 '13 at 15:54
  • Hey Miquel, thank you for the help! The only thing that is keeping the script from running is that i'm getting an "IOError: [Errno 13] Permission denied". I have all the permissions correctly set to the folder where the XLS files are that it's trying to read from, so i'm not sure what the issue is. – LauraJ Jun 03 '13 at 18:21
  • Hi, once it happened a similar thing to me and the problem was that the Excel document was opened when executing the script. I needed to close it before running the program. Maybe it can help you! – Miquel Jun 03 '13 at 21:41
  • I think it was throwing this error because I assigned the wrong thing for "book = open_workbook(file_to_import)". I was using folder_to_import so it didn't like that. Now i think my problem is setting up the iteration loop that is supposed to start after "if file_to_import.endswith('.xls'):" based on what I found online i'm thinking i should use a "for" loop, such as "for each_file in file_to_import" but i'm stumped up to there. I'm sorry, i'm very new to this. – LauraJ Jun 04 '13 at 14:24
  • So I figured out that I needed to put everything after the already existing for loop "for file_to_import in l_files_to_import:" into that loop as well. Basically i just made sure the indentation was correct, but now i'm getting the error "IOError: [Errno 2] No such file or directory: 'Report_Test.XLS'" I checked and that is one of the existing files in the folder so i'm not sure why it's saying it does not exist..... – LauraJ Jun 04 '13 at 16:03
  • I think I got something. I guess you are not executing the script in the `Location` folder. In this case the script does not find the `Report_Test.XLS` file as the variable `file_to_import` contains the file name but not the path to the file. Try to include the path in the name of the file you want to open. Specifically, modify the line in which you open the file for the following: `book = open_workbook(folder_to_import + '\\' + file_to_import)`. Good luck! – Miquel Jun 04 '13 at 20:51
0

or if you're lazy:

rows = 0
for row in ws.rows :
    if rows == 0 : 
        rows += 1
        continue
    ...

The advantage is that you won't have to fool with complicated cell dereferencing: you have the row object and can just do a row[col].value. Tidier.

Alex Rudnicky
  • 246
  • 2
  • 6