how to extract data from one excel workbook and output to another using python xlrd/xlwt?

Question

I am trying to write a script which will automate a copy/paste of employee time sheets from several files to one compiled file. Since they are time sheets with project codes some cells are left blank where an employee worked on a different project that day. Also the files have been converted from xlsx(2007) to .csv.xls which xlrd seems to open just fine.

I do know how to open and create a book object but my knowledge of this module is very limited so I thought maybe a general algorithm would be helpful:

import xlrd, xlwt

put all following in for or while loop to iterate through files:
book = xlrd.open_workbook('mybook.csv.xls')
extract data; store data for ouput
use for loop to iterate over data, output to final sheet
open next file, repeat process storing each output below the previous

I am looking for anything that will help me find the answers, not just code. Any help would be appreciated. Thanks.

Also I am trying to read columns {A:T} and rows {3:27} which include str and float dtypes. I'm pretty sure that a list with type string for existing values and nan or null any empty space would be easy to work with. — Corey, Jun 01 '11 at 19:27
To read/write xlsx files in python use `openpyxl` (https://bitbucket.org/ericgazoni/openpyxl/downloads). However, the real solution is to stop having employees store their time sheets in excel files! — Steven Rumbalski, Jun 01 '11 at 19:28
Indeed. Thanks, I will checkout openpyxl. Also I have worked through most of this problem already so unless you really want to; no more info is needed but thank you! — Corey, Jun 01 '11 at 19:58

score 1 · Answer 1 · edited May 23 '17 at 10:29

I am creating a class called excel functions for xlutils, xlrd, and xlwt that I might ultimately make a library. If you are interested in helping I am trying to make a delete sheet function.

You might want to move towards openpyxl and/or pyexcel because they are easier and have functions for this.

Here is how to copy using open pyxl: Copy whole worksheet with openpyxl

Here is the documentation for pyexcel which is a wrapper for xlwt, xlrd, and xlutils: https://pyexcel.readthedocs.io/en/latest/

If you want to extract data from one excel workbook and output to another you would want to use createCopy(original workbook, other workbook, original filename, new filename)

import xlwt
import xlrd
import xlutils.copy
import xlutils class excelFunctions():

def getSheetNumber(self, fileName, sheetName):

    # opens existing workbook
    workbook = xlrd.open_workbook(fileName, on_demand=True)

    #turns sheet name into sheet number       
    for index, sheet in enumerate(workbook.sheet_names()):
        if sheet == sheetName:
            return index


def createSheet(self, fileName, sheetName):

    # open existing workbook
    rb = xlrd.open_workbook(fileName, formatting_info=True, on_demand=True)

    # make a copy of it
    wb = xl_copy(rb)

    # creates a variable called sheets which stores all the sheet names
    sheets = rb.sheet_names()

    # creates a string which is equal to the sheetName user input
    str1 = sheetName

    # checks to see if the given sheetName is a current sheet
    if (str1 not in sheets):

        # add sheet to workbook with existing sheets
        Sheet = wb.add_sheet(sheetName)

        # save the sheet with the same file name as before
        wb.save(fileName)

    else:
        # this declares the sheet variable to be equal to the sheet name the user gives
        sheet = wb.get_sheet(self.getSheetNumber(fileName, sheetName))

        # save the sheet with the same file name as before
        wb.save(fileName)



def createCopy(self, fileName, fileName2, sheetName, sheetName2):


    # open existing workbook
    rb = xlrd.open_workbook(fileName, formatting_info=True)

    # defines sheet as the name of the sheet given
    sheet = rb.sheet_by_name(sheetName)

    # makes a copy of the original sheet
    wb = xl_copy(rb)

    # creates an int called column_count which is equal to the sheets maximum columns 
    column_count = sheet.ncols - 1

    # creates a blank array called stuff
    Stuff = []

    # this loops through adding columns from the given sheet name
    for i in range (0, column_count):
        Stuff.append([sheet.cell_value(row, i) for row in range(sheet.nrows)])

    # create a sheet if there is not already a sheet    
    self.createSheet(fileName, sheetName2)

    # defines sheet as the new sheet
    sheet = wb.get_sheet(self.getSheetNumber(fileName, sheetName2))

    # this writes to the sheet
    for colidx, col in enumerate(Stuff):
        for rowidx, row in enumerate(col):
            sheet.write(rowidx, colidx, row)

    # this saves the file
    wb.save(fileName2)

Hello Nick, sorry for the late response... I just checked in here today and apparently its been a while, an incredibly long while. How are you coming along? did you continue your efforts? I am dusting off python at the moment and would be interested to hear from you... my email is still the same and located in the comments above... (a little too late to undo that bit of genius) — Corey, Nov 04 '16 at 00:33

score 1 · Accepted Answer · answered Jun 04 '11 at 12:57

This might help ... it reproduces your data as closely as possible (dates remain as dates, empty cells don't become text cells with 0-length contents, booleans and error cells don't become number cells).

from xlrd import XL_CELL_EMPTY, XL_CELL_TEXT, XL_CELL_NUMBER,
    XL_CELL_DATE, XL_CELL_BOOLEAN, XL_CELL_ERROR, open_workbook
from xlwt import Row, easyxf, Workbook

method_for_type = {
    XL_CELL_TEXT:    Row.set_cell_text,
    XL_CELL_NUMBER:  Row.set_cell_number,
    XL_CELL_DATE:    Row.set_cell_number,
    XL_CELL_ERROR:   Row.set_cell_error,
    XL_CELL_BOOLEAN: Row.set_cell_boolean,
    }

date_style = easyxf(num_format_str='yyyy-mm-dd')
other_style = easyxf(num_format_str='General')

def append_sheet(rsheet, wsheet, wrowx=0):
    for rrowx in xrange(rsheet.nrows):
        rrowvalues = rsheet.row_values(rrowx)
        wrow = wsheet.row(wrowx)
        for rcolx, rtype in enumerate(rsheet.row_types(rrowx)):
            if rtype == XL_CELL_EMPTY: continue
            wcolx = rcolx
            wmethod = method_for_type[rtype]
            wstyle = date_style if rtype == XL_CELL_DATE else other_style
            wmethod(wrow, wcolx, rrowvalues[rcolx], wstyle) 
        wrowx += 1
    return wrowx

if __name__ == '__main__':
    import sys, xlrd, xlwt, glob
    rdpattern, wtfname = sys.argv[1:3]
    wtbook = Workbook()
    wtsheet = wtbook.add_sheet('guff')
    outrowx = 0
    for rdfname in glob.glob(rdpattern):
        rdbook = open_workbook(rdfname)
        rdsheet = rdbook.sheet_by_index(0)
        outrowx = append_sheet(rdsheet, wtsheet, outrowx)
        print outrowx
    wtbook.save(wtfname)

Actually I was going to email you! .csv.xls is the extension given to a file that has been converted from xl2007 to csv by "save as" using the extension .csv, my hope in contacting you was to try out the beta for 2007, xlsxrd, since all the files I am dealing with are 2007. It would be great for the entire dept since we are almost all python users! Also I have been trying openpyxl but xlrd was easier for me (beginner) to understand and begin using with a more comprehensive doc, If possible it would be greatly appeciated. — Corey, Jun 05 '11 at 20:36
@Corey: I'll send you the xlsxrd beta as soon as I know your email address. — John Machin, Jun 05 '11 at 21:32
wcpapine | mtu.edu, thank you very much! we will put it to good use. — Corey, Jun 06 '11 at 18:18

how to extract data from one excel workbook and output to another using python xlrd/xlwt?

2 Answers2