comparing varied CSV files in python

Question

Suppose I have 2 CSV files:

file 1:

Epitope Name,Epitope,Protein,position,position

3606,NSRSTSLSV,FOO,10,21

File 2:

A,B,C,D,E,F,G,H,I,J,K

0,1,2,3,4,5,6,7,8,9,NSRSTSLSV

Essentially, I want to see if the contents of row 1 in file 1 are found in row 10 of file 2. If the contents match, I'll print a 3rd csv that is a new version of file 1 with a column saying found or not found.

Right now, I'm getting not found for everything, which I know not to be the case. In some cases, the text from file 1 may be found inside a larger block of text from file 2.

Here's what I have so far (adapted from an answer found earlier):

#usr/bin/python2.4

import csv

f1 = file ('all_epitopes.csv', 'rb')
f2 = file ('positiveBcell.csv', 'rb')
f3 = file ('results.csv', 'w')

c1 = csv.reader((f1), delimiter=",", quotechar='"')
c2 = csv.reader((f2), delimiter=",", quotechar='"')
c3 = csv.writer((f3), delimiter=",", quotechar='"')


positiveBcell = [row for row in c2]

for all_epitopes_row in c1:
    row = 1
    found = False
    for master_row in positiveBcell:
        results_row = all_epitopes_row
        if all_epitopes_row[2] == positiveBcell[10]:
            results_row.append('FOUND in Bcell List (row ' + str(row) + ')')
            found = True
            break
        row = row +1
    if not found:
        results_row.append('NOT FOUND in Bcell list')
    c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

Just curious, is there a reason you did// positiveBcell = [row for row in c2]// when you did //for all_epitopes_row in c1:// for c1? Also, it would help make debugging your code easier (for me at least) if I had access to your csv files. — sihrc, Jul 29 '13 at 19:21

sihrc · Accepted Answer · 2013-07-29T20:44:09.033

0

Suppose your two files

file 1:

Epitope Name,Epitope,Protein,position,position

#Row 1#
3606,NSRSTSLSV,FOO,10,21

File 2:

A,B,C,D,E,F,G,H,I,J,K

#Row 10#
0,1,2,3,4,5,6,7,8,9,NSRSTSLSV

After OP's comment:

for all_epitopes_row in c1:
    row = 1
    found = False
    for master_row in positiveBcell:
        results_row = all_epitopes_row
        **if all_epitopes_row[2] == master_row[10]:**
            results_row.append('FOUND in Bcell List (row ' + str(row) + ')')
            found = True
            break
        row = row +1
    if not found:
        results_row.append('NOT FOUND in Bcell list')
    c3.writerow(results_row)

edited Jul 29 '13 at 20:44

answered Jul 29 '13 at 19:29

sihrc

2,728
2
22
43

Oh. I'm seeing the problem now. I've written this to look at rows when I really meant to specify columns. So what I should have specified I'm looking for is column 10 matching column 2 not row 10 and row 2. I'll see if I can come up with a fix – Gdodge77 Jul 29 '13 at 19:58
For more clarity, I want to search for the entries in C2 (csv 1) and try and find them anywhere in C10 of csv2. – Gdodge77 Jul 29 '13 at 20:22
@Gdodge77 Then your issue is where you have positiveBcell[10] should be master_row[10] – sihrc Jul 29 '13 at 20:31
Beautiful! Works perfectly now! – Gdodge77 Jul 29 '13 at 20:38
@Gdodge77 If this post sufficiently answers your question, please hit that check mark next to the post to accept it as the answer. Thanks, and have a great day. – sihrc Jul 29 '13 at 20:43

comparing varied CSV files in python

1 Answers1

Linked