1

I have a cell in an excel workbook with comma separated values.

CSV values of column

This cell can have values with following pattern.

0 or 123 or 123, 345.

I want to extract them as list of integers using XLRD or pandas.read_excel.

I have tried using xlrd with the following snippet.

book = open_workbook(args.path)
dep_cms = book.sheet_by_index(1)
for row_index in range(1, dep_cms.nrows)
    excelList = []
    excelList.extend([x.strip() for x in dep_cms.cell(row_index, 8).value.split(',')])

I have even tried pandas

excel_frame = read_excel(args.path, sheet_name=2, skiprows=1, verbose=True, na_filter=False)
data_need = excel_frame['Dependent CMS IDS'].tolist()
print(data_need)

But got the list index is out of range.

Reading sheet 2
Traceback (most recent call last):
  File "ExcelCellCSVRead.py", line 25, in <module>
    excel_frame = read_excel(args.path, sheet_name=2, skiprows=1, verbose=True, na_filter=False)
  File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_base.py", line 311, in read_excel
    return io.parse(
  File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_base.py", line 868, in parse
    return self._reader.parse(
  File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_base.py", line 441, in parse
    sheet = self.get_sheet_by_index(asheetname)
  File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_xlrd.py", line 46, in get_sheet_by_index
    return self.book.sheet_by_index(index)
  File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\xlrd\book.py", line 466, in sheet_by_index
    return self._sheet_list[sheetx] or self.get_sheet(sheetx)
IndexError: list index out of range

It is not working with single value in a cell (for example, just 0 or some value like 123). It is outputting AttributeError: 'float' object has no attribute 'split'.

It only works if I have comma separated values, and converts them into list of strings like ['123', '345']. I guess split condition is the culprit.

How to extract the values of this cell using XLRD or pandas to a list of integers?

Regards

CK5
  • 1,055
  • 3
  • 16
  • 29
  • For `read_excel` `skiprows` must be list-like. Maybe this [link](https://stackoverflow.com/a/49801154/13144143) can help. – n1colas.m May 01 '20 at 12:47
  • Also, `sheet_name=2` means the `third sheet`, make sure that this is what you want or simply use the sheet name. – n1colas.m May 01 '20 at 12:54

1 Answers1

1

Comma seperated value (CSV) cannot be compaired to excel during importing.

Instead of using read_excel you can use read_csv.

below is the code snippet that how your code will look like after applying read_csv

Import Pandas as pd
df = pd.read_csv("your file name.csv")
data_need = df["Column_name"].tolist()
Noob Geek
  • 409
  • 6
  • 20
  • Hi @Noob Geek It is not a csv sheet. It is just that a particular cell in an excel sheet has comma separated values. For example, `0` or `123` or `123, 345`. – CK5 May 01 '20 at 12:47