Is there any way to make django remote api run faster in GAE?

Question

Following up this question here.

I finally wrote up a code generation tool to wrap all my database data into something like this:

Pdtfaamt(fano=212373,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='S',itemno='A3',itemamt=75,type=0).save()
Pdtfaamt(fano=212374,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='S',itemno='E1',itemamt=75,type=0).save()
Pdtfaamt(fano=212375,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='S',itemno='E6',itemamt=75,type=0).save()
Pdtfaamt(fano=212376,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='C',itemno='A3',itemamt=3,type=1).save()

Yes, that's right! I pulled the entire database out and transformed the data into population instruction codes so that I am able to migrate my database up to GAE.

So I deployed the django-nonrel project, used django-nonrel remote api to trigger the data population process.

It works okay, except that there is a problem: it's extremely slow. Could anyone tell me how I will be able to improve the speed? I have done some calculation, it may take up to 30 days to get all my data up and running there on GAE.

ps. I am using django-nonrel, and djangoappengine for the backend.

Is this to import into the production datastore or dev? Have you written your import script to make use of remote_api? — Chris Farmiloe, Jun 29 '11 at 10:54
It's for production server. At the same time, I trigger the script from the remote api. — Winston Chen, Jun 29 '11 at 20:45

Chris Farmiloe · Accepted Answer · 2011-06-29T15:47:40.990

2

Write your import script to take advantage of python's multiprocessing Pool

def import_thing(data):
    thing = ThingEntity(**data)
    thing.put()

def main():
    data = [{fano:'212374', comsname:'SMM', },
              {fano:'212374', comsname:'212375', },
              ...etc ]
    pool = multiprocessing.Pool(4) # split data into 4 parts to run in parallel
    pool.map(import_thing, data)

Since the AppEngine production servers like having lots of connections you should play around with the pool size to find the best number. This will not work for importing to the dev server as it's single-threaded.

Also important: Ensure you are putting them in batches of say 10-20 not putting one at a time, or the round-trips will be killing your performance. So an improved script should work in chunks like:

data = [
    [item1,item2,item3],
    [item4, item5, item6],
    [item7, item8, item9],
]
pool.map(import_batch, data)

edited Jun 29 '11 at 15:47

answered Jun 29 '11 at 11:06

Chris Farmiloe

13,935
5
48
57

I am rewriting the code-gen according to your suggestions. However, I got something like this: File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", line 422, in get raise self._value TypeError: __init__() keywords must be strings Are we allowed to use multiprocessing on GAE? or it's because of some other reasons? – Winston Chen Jun 30 '11 at 07:01
by the way, I am still using remote console to trigger the code. – Winston Chen Jun 30 '11 at 07:03
1

Your error sounds like you have a dict with unicode-string keys .... ensure you pass your `**kwargs` as `{"prop":"value"}` and not `{u'prop': 'value'}` – Chris Farmiloe Jun 30 '11 at 09:11

score 1 · Answer 2 · answered Jun 29 '11 at 11:06

1

You probably want to look into the Mapper API.

answered Jun 29 '11 at 11:06

Daniel Roseman

588,541
66
880
895

Is there any way to make django remote api run faster in GAE?

2 Answers2