0

I currently have 2 users of my PHP app, deployed to Google App Engine (GAE) standard env. My aim is to have up to 100 users within a year.

All users execute the same app code, but have their own copy of the database.

Every user needs to sync data with 3 x third party APIs every minute. One of these APIs has a tendency to be very slow to process the request and respond. One of the APIs has some stringent throttles in place, by which it'll block access for a period if more than one API call is made in a 60 second period.

I currently have a cronjob running every minute, which grabs API keys from a user database, makes the three API calls, then repeats the process on the second user database. This works fine, but clearly will not scale.

Using Google App Engine resources, I've devised the following plan to improve the scalability of my app and cope with 100+ users:

  1. Cronjob executes PHP script every minute.
  2. PHP script gets list of DB's on server.
  3. PHP script iterates through list of DB's, creating 3 x GAE push tasks per DB (i.e. 1 per API, per user).
  4. Each push task calls app endpoint that handles sync process for a specific API.

I've not started writing the above routine yet, but it appears to work in principle. The potential issues I foresee are:

  1. Cronjob hits 1 minute execution limit before PHP script has finished creating all of the push tasks. I assume this is unlikely, as I can bundle 100 tasks into a single addTasks() call, so the script execution should be < 10 seconds for 100 users.

  2. Task queue backs up due to slow execution times, meaning API calls are made less frequently than every minute. This could cause some unmanageable data sync issues.

  3. Task execution for a user is delayed, but as the cronjob is creating new tasks every minute, this may result in multiple tasks for the same user and same API being executed in less than a 60 second period, blocking access to one of the APIs.

Does anybody have any thoughts on the above, experience with task queues of this nature, or any tips on GAE push queues that could help me, please?

Andy
  • 69
  • 2

1 Answers1

0

First of all, I would like to remark that Task Queue REST API is not available as of February 20, 2018, so the option to work with Task Queues is to use the new alpha release of the API, called Cloud Tasks API.

Let me provide some comments to the three points that you highlighted in your question:

  1. In your use case, you would have to create the different Push Queues just once, and then run the cron job that would execute a handler which creates the tasks for each user. Maybe a good solution for you is to have different cron jobs, each of which is in charge of creating the push tasks for a subset of users. Handling all the DB searches and tasks creation in a single request may not be feasible depending on how you manage this, so you can have several cron jobs programmed at the same time, so that multiple instances can be span up to handle the parallel requests coming from different cron jobs every minute.
  2. It is true that tasks in a queue are not necessarily processed in the order in which they were enqueued, so this can be a problem if your TASK_2_USER_1 goes to the QUEUE_USER_1 when TASK_1_USER_1 is already there and was not processed. However, you can control the rate at which tasks are processed by defining several directives, as detailed in this guide, in order to make sure that tasks are executed within the expected time frame.
  3. Related to 2; you can control the workers' scaling behavior by adjusting the rate at which tasks are processed. Maybe you can also investigate about retrieving the state of a task (or a queue) before submitting a new task, i.e. if the previous task was not executed yet, don't submit a new task; however, that will become a bigger issue over time, because if the non-blocking interval is 1min and the task submition interval is 1min, this will probably lead to issues eventually.

I think this covers the basics about Task Queues. Any other "deeper" question may be too specific to your use case and it is difficult to help with that (SO community also likes specific questions).

As a last suggestion, make sure to apply to be whitelisted to the new Cloud Tasks API in order to have access to the new documentation.

dsesto
  • 7,864
  • 2
  • 33
  • 50