Let's assume you have two files in a directory called excel_test
:
1.xlsx
Sheet1 Sheet2
col1 col2 col1 col2
1 2 3 4
2.xlsx
Sheet1 Sheet2
col1 col2 col1 col2
5 6 7 8
You can store your extracted data in multiple ways, let's see some methods:
1) A single dictionary
A dictionary where all the keys are strings composed by the "name of the file, underscore, sheet name" and the values are pd.DataFrame
s
import pandas as pd
import os
files_dir = "excel_test/"
files = os.listdir(files_dir)
# here will be stored all the content from your files and sheets
sheets_content = {}
# reading all files inside the folder
for file in files:
# reading the content of a xlsx file
data = pd.ExcelFile(files_dir+file)
# iterating through all sheets
for sheet in data.sheet_names:
# saving the content of the sheet for that file (-5 deletes the .xlsx part from the name of the file and makes everything more readable)
sheets_content[file[:-5]+"_"+sheet] = data.parse(sheet)
print(sheets_content)
Output:
{'1_Sheet1':
col1 col2
0 1 2,
'1_Sheet2':
col1 col2
0 3 4,
'2_Sheet1':
col1 col2
0 5 6,
'2_Sheet2':
col1 col2
0 7 8
}
2) A dictionary of dictionaries
Store all xlsx files in a dictionary with as keys the file names and as value another dictionary. The inner dictionary has keys as sheets names and values are pd.DataFrame
s:
import pandas as pd
import os
files_dir = "excel_test/"
files = os.listdir(files_dir)
sheets_content = {}
for file in files:
data = pd.ExcelFile(files_dir+file)
file_data = {}
for sheet in data.sheet_names:
file_data[sheet] = data.parse(sheet)
sheets_content[file[:-5]] = file_data
Output:
{'1':
{'Sheet1':
col1 col2
0 1 2,
'Sheet2':
col1 col2
0 3 4},
'2':
{'Sheet1':
col1 col2
0 5 6,
'Sheet2':
col1 col2
0 7 8}
}
3) A list of dictionaries
Store all xlsx files in an array where each element is a dictionary with as keys the sheets names and values pd.DataFrame
s:
import pandas as pd
import os
files_dir = "excel_test/"
files = os.listdir(files_dir)
sheets_content = []
for file in files:
data = pd.ExcelFile(files_dir+file)
file_data = {}
for sheet in data.sheet_names:
file_data[sheet] = data.parse(sheet)
sheets_content.append(file_data)
Output:
[
{'Sheet1':
col1 col2
0 1 2,
'Sheet2':
col1 col2
0 3 4},
{'Sheet1':
col1 col2
0 5 6,
'Sheet2':
col1 col2
0 7 8}
]