1

I am importing an excel file with whitespaces at the end of most cell content which need removing. The following script works with sample data:

import pandas as pd

def strip(text):
    try:
        return text.strip()
    except AttributeError:
        return text

def num_strip(text):
    try:
        return text.split(" ",1)[0]
    except AttributeError:
        return text

def parse_excel_sheet(input_file, sheet):
    df = pd.read_excel(
        input_file,
        sheetname= sheet,
        parse_cols = 'A,B,C',            
        names=['ID', 'name_ITA', 'name_ENG'],
        converters = {
            'ID' : num_strip,
            'name1' : strip,
            'name2' : strip,
            }
        )
    return df

file = 'http://www.camminiepercorsi.com/wp-content/uploads/excel_test/excel_test.xlsx'
df = parse_excel_sheet(file,'1')
print(df)

however when trying the script on a larger file, parsing the first column 'ID' does not remove whitespaces.

file = 'http://www.camminiepercorsi.com/wp-content/uploads/excel_test/DRS_IL_startingpoint.xlsx'
df = parse_excel_sheet(file,'test')
print(df)
Andreuccio
  • 1,053
  • 2
  • 18
  • 32
  • Update: the script returns different results if executed from a jupyter console or compiled with a text editor (i.e. sublime text). I am very puzzled! – Andreuccio Sep 27 '17 at 09:41
  • 1
    why not do it after you import the data... `df.ID = df.ID.str.strip()` – DJK Sep 27 '17 at 14:02
  • @djk47463 I tried to do something like that in multiple ways (all using `.str` and `.strip()` functions(, but it never worked. Yours does though! Thank you! – Andreuccio Sep 27 '17 at 17:58
  • Glad to help out! :) – DJK Sep 27 '17 at 21:29

1 Answers1

0

I just run your code and found that whitespaces were correctly removed from column 'ID' in larger file:

for i, el in enumerate(df['ID'].values):
# print(i)
if " " in el:
    print(el)

returns no element from 'ID' column: there's no whitespace in these 28 elements. How did you checked that this was not the case?

JustForFun
  • 72
  • 8