1

I have two CSV files that look like this..

CSV 1

reference  |  name  |  house
----------------------------
2348A      |  john  |  37
5648R      |  bill  |  3
RT48       |  kate  |  88
76A        |  harry |  433

CSV2

reference
---------
2348A
76A

Using Python and CSVkit I am trying to create an output CSV of the rows in CSV1 by comparing it to CSV2. Does anybody have an example they can point me in the direction of?

fightstarr20
  • 11,682
  • 40
  • 154
  • 278

2 Answers2

4

I would recommended to use pandas to achieve what you are looking for:

And here is how simple it would be using pandas, consider your two csv files are like this:

CSV1

reference,name,house
2348A,john,37
5648R,bill,3
RT48,kate,88
76A,harry ,433

CSV2

reference
2348A
76A

Code

import pandas as pd
df1 = pd.read_csv(r'd:\temp\data1.csv')
df2 = pd.read_csv(r'd:\temp\data2.csv')
df3 = pd.merge(df1,df2, on= 'reference', how='inner')
df3.to_csv('outpt.csv')

output.csv

,reference,name,house
0,2348A,john,37
1,76A,harry ,433
Abbas
  • 3,872
  • 6
  • 36
  • 63
0

I'd recommend using a tool like;

  • csvjoin from csvkit or
  • Pandas

Solution 1:

pip install csvkit
$ csvjoin --help
usage: csvjoin [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
               [-p ESCAPECHAR] [-z MAXFIELDSIZE] [-e ENCODING] [-S] [-v] [-l]
               [--zero] [-c COLUMNS] [--outer] [--left] [--right]
               [FILE [FILE ...]]

Example: Left Join on column of [reference]

csvjoin --columns "reference" --left CSV1.csv CSV2.csv

Solution 2:

You can also use the pandas library which is easier to manipulate Dataframe and filter by column.

import pandas as pd

df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')

df = df1[df1['reference'].isin(df2['reference'])]
df.to_csv('output.csv', index=False)
H.Elci
  • 216
  • 1
  • 5