0

I am trying to extract the purchase_state from google play, following the steps bellow:

    import base64
    import requests
    import smtplib
    from collections import OrderedDict
    import mysql.connector
    from mysql.connector import errorcode
    ......
  1. Query db ,returning thousand of lines with purchase_idfield from my table
  2. Check for each row from db, and extract purchase_id, then query google play for all of them. for example if the results of my previous step is 1000, 1000 times is querying google (refresh token + query).
  3. Add a new field purchase status from the google play to a new dictionary apart from some other fields whcih are grabbed from mysql query.
  4. The last step is doing a loop over my dic as a follow and prepare the desirable report

AFTER EDITED:

def build_dic_from_db(data,access_token):
dic = {}
for row in data:

    product_id = row['product_id']
    purchase_id = row['purchase_id']
    status = check_purchase_status(access_token, product_id,purchase_id)
    cnt = 1

    if row['user'] not in dic:
        dic[row['user']] = {'id':row['user_id'],'country': row['country_name'],'reg_ts': row['user_registration_timestamp'],'last_active_ts': row['user_last_active_action_timestamp'],
                            'total_credits': row['user_credits'],'total_call_sec_this_month': row['outgoing_call_seconds_this_month'],'user_status':row['user_status'],'mobile':row['user_mobile_phone_number_num'],'plus':row['user_assigned_msisdn_num'],
                            row['product_id']:{'tAttemp': cnt,'tCancel': status}}
    else:
        if row['product_id'] not in dic[row['user']]:
            dic[row['user']][row['product_id']] = {'tAttemp': cnt,'tCancel':status}
        else:
            dic[row['user']][row['product_id']]['tCancel'] += status
            dic[row['user']][row['product_id']]['tAttemp'] += cnt
return dic

The problem is that my code is working slowly ~ Total execution time: 448.7483880519867 and I am wondering if there is away to improve my script. Is there any suggestion?

pm1359
  • 622
  • 1
  • 10
  • 31
  • The bottleneck seems to come from the google play request. Have you considered creating several executors to query the db in separate threads? – Jean-François Fabre Jun 16 '16 at 18:54
  • @Jean-FrançoisFabre hey, can u please explain it a little bit more, what does that mean, I am new to python! To notify u, I have only one sql query to mysql without filtering, then i am going through all the purchase_id and i am querying the google per each purchase. I didn't apply any filtering in mysql, as my query is huge enough and I tried to build my filtering in my dictionary. – pm1359 Jun 17 '16 at 08:23

1 Answers1

1

I hope I'm right about this, but the bottleneck seems to be the connection to the playstore. Doing it sequentially will take a long time, whereas the server is able to take a million requests at a time. So here's a way to process your jobs with executors (you need the "concurrent" package installed) In that example, you'll be able to send 100 requests at the same time.

from concurrent import futures
EXECUTORS = futures.ThreadPoolExecutor(max_workers=100)
jobs = dict()
for row in data:

   product_id = row['product_id']
   purchase_id = row['purchase_id']
   job = EXECUTORS.submit(check_purchase_status,
access_token, product_id,purchase_id)
   jobs[job] = row

for job in futures.as_completed(jobs.keys()):
   # here collect your results and do something useful with them :)
   status = job.result()
   # make the connection with current row
   row = jobs[job]
   # now you have your status and the row

And BTW try to use temp variables or you're constantly accessing your dictionary with the same keys, which is not good for performance AND readability of your code.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • It seems to be great! I am going to try with my case. I would like to be sure about one thing, in each row I have only one purchase id, so in each time i am going to send only one purchase_id to the google, Do u think this solution would work in my case? I though that your case would be great with the a list of purchase_id not with one only! What 's your opinion? – pm1359 Jun 17 '16 at 10:59
  • That's the point: send 1000 requests without being blocked by google response time, just like 1000 users would do at the same time. – Jean-François Fabre Jun 17 '16 at 11:04
  • I have edited my question, I can not understand how can I send 1000 requests, shall I change the way i have written my function? – pm1359 Jun 17 '16 at 11:35