For one you must make sure that you aren't making any blocking calls in your code,
as that will also block everything else from running, slowing the entire system.
Reasons for blocking include tight loops or IO that has not been patched by eventlet's monkey patch (e.g. C extensions).
Celery supports using eventlet & gevent, and that is probably the recommended concurrency
option for what you are doing (web request IO). Celery may not make your code run faster though, but it enables you to easily distribute the work to many machines.
To optimize you should always profile your code to find out what the bottleneck is. It could be many things, e.g. slow network, slow host, slow DNS or something else entirely.