How to remove strange characters while using regex and xlrd module Python

Question

I am trying to read the second column of an excel file using xlrd module. But the issue is the second column also has blank rows . I need to select only the values but not the rows . Below is my code for it:

import xlrd
import sys
import re

workbook_name = sys.argv[1]
if workbook_name:
    book = xlrd.open_workbook(workbook_name)
    for sheet in book.sheet_names():
        if re.search(r'Munich',sheet):
            sh = book.sheet_by_name(sheet)
            #num_cols = sh.ncols
            for row_ids in range(0,sh.nrows):
                cell_obj = str(sh.cell(row_ids,1))
                blank_regex = re.compile(r'u\'\'')
                if not re.search(blank_regex,cell_obj):
                    #re.sub('^.+u'()',\1,cell_obj)
                    print(cell_obj)
else:
    print ("Please supply workbook_name")

When i get the output , this is what i get :

text:u'Dom0'
text:u'muclgd0008.dedc2.cloud.com'
text:u'muclgd0007.dedc2.cloud.com'
text:u'muclgd0006.dedc2.cloud.com'
text:u'muclgd0005.dedc2.cloud.com'
text:u'muclgd0004.dedc2.cloud.com'
text:u'muclgd0003.dedc2.cloud.com'
text:u'Dom0'
text:u'muclmx0032.dedc2.cloud.com'
text:u'muclmx0031.dedc2.cloud.com'
text:u'muclmx0030.dedc2.cloud.com'
text:u'muclmx0029.dedc2.cloud.com'
text:u'muclmx0028.dedc2.cloud.com'
text:u'muclmx0027.dedc2.cloud.com'
text:u'muclmx0026.dedc2.cloud.com'
text:u'muclmx0025.dedc2.cloud.com'
text:u'muclgp0002.dedc2.cloud.com'
text:u'muclgp0001.dedc2.cloud.com'
text:u'Hardware Device'
text:u'Exadata X2-2 Quater Rack'
text:u'Exadata X2-2 Quater Rack'
text:u'ZFS Filer'
text:u'BDA'

I am not sure why this strange text:u'' is coming at the beginning.These characters are not there in the excel sheet.

Can someone please guide me on how to remove the same.

Thanks in advance.

The "u" represents a Unicode string. – Rakesh Feb 05 '18 at 06:25 — Rakesh, Feb 05 '18 at 06:25
That's how the cell look like. You want the cell value. – user202729 Feb 05 '18 at 06:25 — user202729, Feb 05 '18 at 06:25

score 0 · Accepted Answer · answered Feb 05 '18 at 06:46

You're getting the u because you're using Python 2, and you're getting the quotes because you are printing out the "cell object" (which you implicitly convert to its "repr"), rather than its value. Use sh.cell_value() instead of str(sh.cell()).

Once you do that, you can just strip whitespace and check if the result is non-empty:

if cell_text.strip():
    print(cell_text)

How to remove strange characters while using regex and xlrd module Python

1 Answers1