want to use panda to make it look/work like sql

Question

    pysql = lambda q: pdsql.sqldf(q, globals())
    str1 = "select coalesce(ID1, H_ID, [Alternate Source Unique 
    Identifier]) as Master_ID, [Alternate Source Unique Identifier] as Q_ID 
    from crosswalk;"
     with Timer("Load master_ids:"):
    master_id_list = pysql(str1)
    print("Records: {}".format(len(master_id_list)))
    master_id_list.head()

pySQL runs in just 5 seconds !!

I want to write the second script in python because I can't use pysql :(, Any idea ? your best translation of the second script in python please ?

I have done 2 propositions but not really effective in term of time The first one without Pandas (which is the required)

    def coalesce (df, column_names):
    i=iter(column_names)
    column_name=next(i)
    answer=df[colum_name]
    for column_name in i:
    answer = answer.fillna(df[column_name])
    return answer
    coalesce(df, ['first', 'third', 'second'])

Thank you for your advices !

why are you using the ``copy()`` method? I think It creates a lot of overhead. — Dimgold, Jul 13 '17 at 07:01

score 0 · Answer 1 · answered Jul 13 '17 at 07:05

0

If all you need is to count the number of records (len(master_id_list)) you can just use it directly:

crosswalk['ID'].size

or

crosswalk.shape

if you are looking for unique values, try:

crosswalk['ID'].unique().size

answered Jul 13 '17 at 07:05

Dimgold

2,748
5
26
49

Thank you Ok for this , but I need to reformulante the first script using the idea in script 2 ... does this make sens ? – Video retrieval Jul 13 '17 at 07:25
I might be missing something - don't you just print the number of the results? – Dimgold Jul 13 '17 at 09:36
I just want to use panda and make it look like sql ... also in term of performances – Video retrieval Jul 13 '17 at 11:42

want to use panda to make it look/work like sql

1 Answers1