-2

I have 2 csv files first.csv and second.csv. They both have a shared column.

Example: first : a b c d second: x y a z

I have to create a third csv file that looks like this: third : a b c d x y z

The files do not have the same number of entries, I must only merge the lines that share the same column. Also the paths of the 3 csv files must be sent as parameters.

I was trying to do this in Java but Python would also work!

I don't really know what I should do :(

1 Answers1

1

If they will always have exactly 1 shared column and you want to merge the records (lines) that have the same value in that column, then the following code might help you:

import pandas as pd

def merge_csv_files(first_file_path, second_file_path, output_file_path):
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    shared_column = set(first_df.columns) & set(second_df.columns)
    # look for exactly 1 shared column
    if len(shared_column) != 1:
        raise ValueError("The CSV files do not have exactly one shared column.")
    shared_column = shared_column.pop()

    merged_df = pd.merge(first_df, second_df, on=shared_column, how='inner')

    merged_df.to_csv(output_file_path, index=False)

first_file_path = 'first.csv'
second_file_path = 'second.csv'
output_file_path = 'third.csv'

merge_csv_files(first_file_path, second_file_path, output_file_path)