1

Need some help here guys,

this is my code:

   import xlutils
   import xlrd
   import os
   import sys
   datafile = r'C:\\someexcelfileediting.xlsx'
   workbook = xlrd.open_workbook(datafile)
   stone = workbook.sheet_by_name(input('What is the name of the sheet you are trying to reference?  ').upper())
   paper = workbook.sheet_by_name(input('what sheet would you like to check?  ').upper())
   def check_Base():
   set2 = set()
   for row in range (0, paper.nrows):    
       for col in range(0, paper.ncols):       
           set2.add(paper.cell_value(row, col))
   print (len(set2))
   print (set2)
check_Base()

what I end up with is 79 of 91 values for the excel sheet it is iterating over and I do not understand why it is excluding the 12 entries in the file. there doesn't seem to be a pattern to the data that it is omitting its random values from different rows and columns. any help would be appreciated.

Thanks, Will

Will
  • 13
  • 1
  • 3
  • Just to be clear: do you understand the difference between a `set` and a `list`? – DSM Mar 10 '15 at 18:01
  • Are you sure a `set` is the appropriate data structure? A set is going to be unordered and eliminate duplicates. – Joe Holloway Mar 10 '15 at 18:01
  • i would like to use set so i can use "set.difference"set3 = set1-set2" to find the strings that are not in one and yet in another and then print those values to a different sheet – Will Mar 10 '15 at 18:04
  • @JoeHolloway Thanks I didn't realize that saving it as a set would remove duplicates and one of my columns has the same value so it removed all twelve of them and that is why my total is off. Thank you – Will Mar 10 '15 at 18:21

2 Answers2

1

A set is going to give you an unordered collection of unique values. If you have duplicate cells in your spreadsheet, only the first one will be added to the set, the rest will be discarded.

Based on your comments, it sounds like you're just doing some debugging, but if you really need to count the cells that you've unpacked, one option is to append them to a list first and then convert that to a set later.

mylist = []
for row in range (0, paper.nrows):    
    for col in range(0, paper.ncols):       
        mylist.append(paper.cell_value(row, col))

print len(mylist) # 91

myset = set(mylist)

print len(myset) # 79
Joe Holloway
  • 28,320
  • 15
  • 82
  • 92
-1

I would say, instead of creating a list and then convert it to a set, initialize an empty set first and keep on adding elements to it. It'll automatically take care of set features. This will be more performance effective.

myset= set()
for row in range (0, paper.nrows):    
    for col in range(0, paper.ncols):       
        myset.add(paper.cell_value(row, col))

print len(myset) # 79
Swadeep
  • 310
  • 1
  • 4
  • 10
  • You're correct in your statement, but I don't think you understood the question being asked. Your 'solution' is the same code (with a different variable name) presented in the question. My answer was only intended to demonstrate the difference in counts between list and set, not as a recommendation in how to produce a set when that's the desired data structure. The questioner was already doing that, but confused by the result. – Joe Holloway Apr 16 '21 at 17:07