1

I am working with excel for comparing three columns: my idea is to compare two columns of data with a third column as a array like each value in the 3rd column should be compared with every row of the first and second column and want to extract only those rows where the first and second column's data is present in the third column I used this python command

if([x in x,y for datafile] == [x in x for file) and [y in x,y for datafile] == [x in x for file]): 
    print x,y
else:
    print none        

this gave me an error as syntax error

I have converted my first two columns into a tuple using the zip function the x,y corresponds to the values in the tuple

Col_1 ||  Col_2    ||   file
Abc   |    Abk     |    cnl
Nck   |    Nck     |    Abk
xkl   |    cnl     |    Abc  
mzn   |    mzn     |  

this I have combined as datafile ((Abc,Abk),(Nck,Nck),(xkl,cnl),(mzn,mzn))

Note: my column 3 has smaller values than col 1&2. I have over 100k values to compare

I want a working python program for this query

if [x for x,y in mydata if x == genelist and
y for x,y in mydata if y == genelist]:
    print (x,y)
else: 

can someone correct the syntax error in the above code here

mydata('gene1,genea','gene2,geneb''gene3,genec') and genelist ('genea','geneb','genec') 

when I use the code without if statement it prints me "[]" I don't know what's wrong here

James Z
  • 12,209
  • 10
  • 24
  • 44

1 Answers1

1

You could use pandas.Series.isin to filter it:

For your excel data (eg:comparison.xlsx) :

enter image description here

Use:

import pandas as pd
df = pd.read_excel('comparison.xlsx')
result = df[df['finaldata1'].isin(list(df['check'])) & df['finaldata2'].isin(list(df['check']))]
result

it will give you:

    finaldata1  finaldata2  check
0   Abc         Abk         cnl

as Abc and Abk is in column file.

Update: Write result to excel file:

from pandas import ExcelWriter

writer = ExcelWriter('PythonExport.xlsx')
result.to_excel(writer,'Sheet1',index=False)
writer.save()

The result will be write into excel file PythonExport.xlsx:

enter image description here

Tiny.D
  • 6,466
  • 2
  • 15
  • 20
  • Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import pandas as pd >>> import xlrd >>> df = pd.read_excel('Book1.xlsx') >>> result = df[df['finaldata'].isin(list[df['check']])] Traceback (most recent call last): File "", line 1, in result = df[df['finaldata'].isin(list[df['check']])] TypeError: 'type' object has no attribute '__getitem__' – Lakshmi KrishnaKumaar May 17 '17 at 04:28
  • Your result is not correct, should be `result = df[df['finaldata'].isin(list(df['check']))]` **not** `result = df[df['finaldata'].isin(list[df['check']])]`, change to `()` after `list` instead of `[]`. – Tiny.D May 17 '17 at 04:46
  • Empty DataFrame Columns: [finaldata, check] Index: [] – Lakshmi KrishnaKumaar May 17 '17 at 04:52
  • are you sure there is data in your `df` ? it seems that your `df` is empty – Tiny.D May 17 '17 at 04:55
  • data is present anyways the program allows comparison with one column but i wanna compare two columns of data with an array/list of data – Lakshmi KrishnaKumaar May 17 '17 at 05:02
  • thank you so much that worked great I have got the expected output can you help me to write the results onto an excel file from python – Lakshmi KrishnaKumaar May 17 '17 at 05:17
  • If it is solution, pls mark it as an accept answer. I will update it to let you know how to write result to excel file. – Tiny.D May 17 '17 at 05:19
  • I have clicked the up arrow to vote should I do something else to accept ? – Lakshmi KrishnaKumaar May 17 '17 at 05:30
  • Up vote is for the answer is useful, accept is for you to accept it to a right answer for your question. – Tiny.D May 17 '17 at 05:33
  • @LakshmiKrishnaKumaar you haven't mark it as accept answer, please mark it if the answer help you, and it can be closed also, Thanks. – Tiny.D May 17 '17 at 11:24
  • I cant find out how to accept the answer I have upvoted which the blog is accepting how should I accept sorry this is the first question I post herer – Lakshmi KrishnaKumaar May 18 '17 at 05:11
  • @LakshmiKrishnaKumaar please refer to http://stackoverflow.com/help/someone-answers – Tiny.D May 18 '17 at 05:17
  • Done it sorry for the trouble – Lakshmi KrishnaKumaar May 18 '17 at 05:50
  • Cheers, hope you will enjoy StackOverFlow – Tiny.D May 18 '17 at 05:52
  • for the above mentioned question of mine I found a solution in awk awk '{a[$1]; next} $1 in a && ' – Lakshmi KrishnaKumaar Jun 15 '17 at 09:59