0

I'm fairly new to pandas, I'm attempting to read an excel file and have the data returned to me based on specific cell values.

I have an excel workbook, with several sheets that has data inside tables presented like so.

  LUID       DATE & TIME IN  S# - IN         STATUS      S# - OUT      DATE & TIME OUT
PTLZ191238    2021-02-21      S12345     LOADING (OUT)   s123456          2021-03-01

Ive been able to iterate through the workbook sheets, now i want the data to be returned only if DATE & TIME IN has a value, and DATE & TIME OUT is blank. this is my code so far

import pandas as pd

#main Tyr Workbook
xl_file = 'excel_workbook.xlsm'

worksheets
fwd = pd.read_excel(xl_file, sheet_name='sheet1`')
epes = pd.read_excel(xl_file,sheet_name='sheet2')
mvt = pd.read_excel(xl_file,sheet_name='sheet3')
ls = pd.read_excel(xl_file,sheet_name='sheet4')
knight = pd.read_excel(xl_file,sheet_name='sheet5')
hzl = pd.read_excel(xl_file,sheet_name='sheet6')
cfi = pd.read_excel(xl_file,sheet_name='sheet7')
hub = pd.read_excel(xl_file,sheet_name='sheet8')
eh = pd.read_excel(xl_file,sheet_name='sheet9')
ttl = pd.read_excel(xl_file,sheet_name='sheet10')

all_sheet = [fwd,epes,mvt,ls,knight,hzl,cfi,hub,eh,ttl]

for sheet in all_sheet:
    print(sheet.head(10))

Ive scoured through the documents and cant seem to find my answer, any help would be greatly appreciated

andre
  • 1

1 Answers1

0

In order to test a dataframe, pandas uses boolean indexing. The predicate can be a specified value, or as in your case, it can be a test against empty or NaN ("Not a number", the default missing value marker in pandas).

Probably best to show with an example:

sheet[sheet["DATE & TIME IN"] == '2021-02-21']

Here we have the dataframe, the dataframe column, the type of test (equality) and the value we are testing against.

To test whether the imported value is missing, notna() tells us it is present, and isna() tests for missing.

sheet[sheet["DATE & TIME IN"].notna()

sheet[sheet["DATE & TIME OUT"].isna()

If we are testing more than one clause, we must wrap each with in brackets and use either "&" for AND or "|" for OR.

sheet[(sheet["DATE & TIME IN"].notna()) & (sheet["DATE & TIME OUT"].isna())]

An dataframe has an attribute named "empty" which will tell you if there is any data. There are other ways of testing this.

Putting it all together:

for sheet in all_sheet:
    # print(sheet.head(10))
    results = sheet[(sheet["DATE & TIME IN"].notna()) & 
                (sheet["DATE & TIME OUT"].isna())]
    if not results.empty:
        print(results.head(10))

Note that if you don't want to manually load all the sheets, and you are happy to leave the dataframe names as the sheet names, you can load all of the workbook in one step by either passing a list to pd.read_excel (which gives you a dictionary of worksheets) or using pd.ExcelFile (which gives you can Excel file object).

all_sheet = pd.read_excel(xl_file, sheet_name=['sheet1', 'sheet2', 'sheet3', 'sheet4', 'sheet5', 'sheet6', 'sheet7', 'sheet8', 'sheet9', 'sheet10'])
all_sheet = []
for xls in pd.ExcelFile(xl_file):
    all_sheet.append(pd.ExcelFile.parse)
Alan
  • 2,914
  • 2
  • 14
  • 26