0

Have a question about matching data. I have two excel files one in the extract of the database which is updated once a while and doesn't hold all records because it is not linked to the source application where information is stored.

the other extract that I got is the extract of a system where everyone puts in information.

The two excel files have a lot of id numbers. My teacher asks me to match the data so I can see which ones are missing. He told me to use a v look up but that doesn't make sense. is there a more easy way to match data out of two excel sheets?

thanks for your time in advance.

Robel
  • 1

1 Answers1

0

I recommend to use pandas library with concat.

import glob
import pandas as pd
 
# specifying the path to excel files
path = "C:/downloads"
 
# excel files in the path
file_list = glob.glob(path + "/*.xlsx")
 
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the 
# excel data into pandas dataframe.
excl_list = []
 
for file in file_list:
    excl_list.append(pd.read_excel(file))
 
# concatenate all DataFrames in the list
# into a single DataFrame, returns new
# DataFrame.
excl_merged = pd.concat(excl_list, ignore_index=True)
 
# exports the dataframe into excel file
# with specified name.
excl_merged.to_excel('merged_excel.xlsx', index=False)

You can also use pd.merge() once you have your two different file opened (df1 = pd.read_excel(file1); df2 = pd.read_excel(file2)), to merge them on specific column names, maybe id in your case. df1.merge(df2, left_on='lkey', right_on='rkey') to add suffixes according to the file.

Only god knows
  • 307
  • 1
  • 10