1

I have two CSV files, both have different numbers of rows and Columns, in File 1, I have the below structure,

PRODUCT NAME PRODUCT TYPE PRODUCT SERIAL
Galaxy S6 Android Phone IAJ83934830485
Surface Pro Windows Tablet IM94545435493
iPhone 13 iOS Device IK983459435493

File 2 has below structure

Shipment Day Price PRODUCT SERIAL
Thursday 70.5 IAJ83934830485
Friday 121.3 IK983459435493

What I am trying to achieve is to compare product serial column to find out the Shipment Day and Price from the other dataframe for devices that are in it so the final result is as follows

PRODUCT NAME PRODUCT TYPE PRODUCT SERIAL Shipment Day Price
Galaxy S6 Android Phone IAJ83934830485 Thursday 70.5
Surface Pro Windows Tablet IM94545435493 N/A N/A
iPhone 13 iOS Device IK983459435493 Friday 121.3

I have tried using Numpy where but results are not accurate. Any suggestions or advise would be really helpful

re-learn
  • 13
  • 3

1 Answers1

1

you can use pandas library to do this:

import pandas as pd

# Read the files into two dataframes.
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

# Merge the two dataframes, using _ID column as key
df3 = pd.merge(df1, df2, on = 'PRODUCT-SERIAL')
df3.set_index('PRODUCT-SERIAL', inplace = True)

# Write it to a new CSV file
df3.to_csv('file3.csv')

additional reference link: Merge two CSV files based on a data from the first column

arp5
  • 169
  • 10
  • This solution needs `how="left"` in the merge, otherwise it drops the products that don't have a serial match in `df2`. – creanion Jun 27 '22 at 10:02