0

Pandas iterates through a workbook and imports data from a list of worksheet names ('fb'). If the list only contains a single value then it adds that sheet to the data frame. If the list contains more than a single value, then it concatenates the worksheets into a single worksheet.

The issue is as currently coded the df2 returns a 3-d shape rather than 2-d. I have tried .melt and .unstack but must be using them incorrectly. The end result should be row=31, column=15. Code works perfectly fine if there are multiple worksheets. The addition of the if to check the list length creates the issue 'if len(fb) >= 2 else pd.DataFrame(df2)'

# create hr fb dataframe
# read data from list of worksheet
df2 = [pd.read_excel(xls, sheet_name=s, skiprows=5, nrows=31, usecols='M:AA', header=None) for s in fb]
# merge above data into a single sheet
dfc2 = pd.concat(df2, ignore_index=True, sort=False) if len(fb) >= 2 else pd.DataFrame(df2)
# rename column index numbers
dfc2.columns = ['C1', 'C2', 'C3', 'C4', 'C5',
                'C6', 'C7', 'C8', 'C9', 'C10', 'C11',
                'C12', 'C13', 'C14', 'C15']
# if column C contains a zero remove the row
dfc2 = dfc2[dfc2.C3 != 0]
# add additional blank columns to end
dfc2[colnum2] = ""
# clear columns containing useless data
dfc2.loc[:, 'C1'] = ""
dfc2.loc[:, 'C11'] = ""
dfc2.loc[:, 'C13'] = ""
# rearange column data to match final format needed
dfc2.loc[:, 'C23'] = dfc2.loc[:, 'C5']
dfc2.loc[:, 'C5'] = ""
dfc2.loc[:, 'C14'] = ""
dfc2.loc[:, 'C17'] = dfc2.loc[:, 'C15']
dfc2.loc[:, 'C15'] = ""
dfc2.loc[:, 'C27'] = dfc2.loc[:, 'C3']
dfc2.loc[:, 'C3'] = ""
dfc2.loc[:, 'C3'] = dfc2.loc[:, 'C7']
dfc2.loc[:, 'C7'] = ""
dfc2.loc[:, 'C7'] = dfc2.loc[:, 'C27']
dfc2.loc[:, 'C27'] = ""
# end of hr fb dataframe creation

enter image description here

I have tried to implement solutions from the below answers to similar questions, but can't seem to get the format correct. Any help would be greatly appreciated.

converting 3D pandas dataframe to 2d [duplicate] Reshape MultiIndex dataframe to tabular format

Rand0mdude
  • 27
  • 1
  • 10

2 Answers2

0

You may just want to slice the array like this answer, before returning. Since the shape is returning (1, 31, 15), you could modify it to be:

df[0, :, :].squeeze()
Lucas
  • 26
  • 3
  • returns a TypeError "list indices must be integers or slices, not tuple". 'if len(fb) >= 2 else pd.DataFrame(df2[0, :, :].squeeze())' might have placed it in the incorrect spot as I am not sure what this .squeeze() is actually doing. – Rand0mdude Oct 03 '22 at 22:08
0

Corrected by using Numpy reshape.

# create hr fb dataframe
# read data from list of worksheet
df2 = [pd.read_excel(xls, sheet_name=s, skiprows=5, nrows=31, usecols='M:AA', header=None) for s in fb]
# merge above data into a single sheet
dfc2 = pd.concat(df2, ignore_index=True, sort=False) if len(fb) >= 2 else pd.DataFrame(np.reshape(df2, (31, 15)))
# rename column index numbers
dfc2.columns = ['C1', 'C2', 'C3', 'C4', 'C5',
                'C6', 'C7', 'C8', 'C9', 'C10', 'C11',
                'C12', 'C13', 'C14', 'C15']
# if column C contains a zero remove the row
dfc2 = dfc2[dfc2.C3 != 0]
# add additional blank columns to end
dfc2[colnum2] = ""
# clear columns containing useless data
dfc2.loc[:, 'C1'] = ""
dfc2.loc[:, 'C11'] = ""
dfc2.loc[:, 'C13'] = ""
# rearange column data to match final format needed
dfc2.loc[:, 'C23'] = dfc2.loc[:, 'C5']
dfc2.loc[:, 'C5'] = ""
dfc2.loc[:, 'C14'] = ""
dfc2.loc[:, 'C17'] = dfc2.loc[:, 'C15']
dfc2.loc[:, 'C15'] = ""
dfc2.loc[:, 'C27'] = dfc2.loc[:, 'C3']
dfc2.loc[:, 'C3'] = ""
dfc2.loc[:, 'C3'] = dfc2.loc[:, 'C7']
dfc2.loc[:, 'C7'] = ""
dfc2.loc[:, 'C7'] = dfc2.loc[:, 'C27']
dfc2.loc[:, 'C27'] = ""
# end of hr fb dataframe creation
Rand0mdude
  • 27
  • 1
  • 10