1

I founds threads on extracting info from various sheets of the same file and solutions to problems similar, but not exactly like mine.

I have a several Excel workbooks each containing several sheets. I would like to iterate over each workbook and extract information from a sheet name "3. Prices". This sheet is available in all files. The pieces of information to be extracted from this sheet in every file are two. The first is always found in cell range E13:H13 and the second from cells F19, I19 and K19.

I would like place the two pieces of extracted information next to one another (for a given file) and then stack the extract from every file on top in one master file. Also, the first column of the combined file should be the file name.

So something like this:

enter image description here

What I've tried so far, with no luck

from openpyxl import load_workbook
import os
import pandas as pd

directory = os.listdir('C:\\User\\files')

for file in directory:
    if os.path.isfile(file):

        file_name = file[0:3]
        workbook = load_workbook(filename = file)
        sheet = workbook['3. Prices']
        e13 = sheet['E13'].value
        f13 = sheet['F13'].value
        g13 = sheet['G13'].value
        h13 = sheet['H13'].value
        f19 = sheet['F19'].value
        i19 = sheet['I19'].value
        k19 = sheet['K19'].value

        df = df.append(pd.DataFrame({
                "File_name":file_name, 
                "E13":e13, "F13":f13, "G13":g13,"H13":h13,
                "F19":f19,"I19":i19,"K19":i19,
                }, index=[0]))
StatsScared
  • 517
  • 6
  • 20
  • _with no luck_ Please be more specific. – AMC Mar 14 '20 at 03:09
  • By the way, it’s always a good idea to use a temporary data structure and create the DataFrame at the end, rather than repeatedly appending or concatenating. – AMC Mar 14 '20 at 03:15

1 Answers1

0

I figured it out. I was missing two elements: 1) changing the current working directory to match the one in the variable 'directory' and 2) define a dataframe at the start

from openpyxl import load_workbook
import os
import pandas as pd

os.chdir('C:\\User\\files') 
directory = os.listdir('C:\\User\\files')

df=pd.DataFrame()    

for file in directory:
    if os.path.isfile(file):

        file_name = file[0:3]
        workbook = load_workbook(filename = file, data_only=True)
        sheet = workbook['3. Prices']
        e13 = sheet['E13'].value
        f13 = sheet['F13'].value
        g13 = sheet['G13'].value
        h13 = sheet['H13'].value
        f19 = sheet['F19'].value
        i19 = sheet['I19'].value
        k19 = sheet['K19'].value

        df = df.append(pd.DataFrame({
                "File_name":file_name, 
                "E13":e13, "F13":f13, "G13":g13,"H13":h13,
                "F19":f19,"I19":i19,"K19":i19,
                }, index=[0]))
StatsScared
  • 517
  • 6
  • 20