Dropbox's ATF - How functions/callbacks are stored in database?

Question

I am reading about dropbox's Async. Task Framework and its architecture from dropbox tech blog: https://dropbox.tech/infrastructure/asynchronous-task-scheduling-at-dropbox

The architecture seems to be clear to me but what I can't understand is how the callbacks (or lambda in their terminology) can be stored in the database for later execution? Because they are just normal programming language functions right? Or am I missing something here?

Also,

It would need to support nearly 100 unique async task types from the start, again with room to grow.

It seems that here they are talking about types of lambda here. But how that is even possible when the user can provide arbitrary code in the callback function?

Any help would be appreciated. Thanks!

Are you interested how it is solved in Dropbox ATF or in any other background job scheduler (like [Hangfire](https://www.hangfire.io/))? — Peter Csala, Oct 25 '22 at 08:10
In any similar solution. I just want to understand how that's done. — Kaushal28, Oct 25 '22 at 13:10

score 2 · Accepted Answer · answered Oct 30 '22 at 05:33

I found the answer from the article itself. The core ATF framework just defines the type of tasks/callbacks it supports (e.g. Send email is a type of task) and creates corresponding SQS queues for them (for each task, there are multiple queues for different priorities).

The user (who schedules the task) does not provide function definition while scheduling the task. It only provides details of the function/callback that it wants to schedule. Those details will be pushed to the SQS queue and it's user's responsibility to create worker machines which listens for the specific type of tasks on SQS and also has the function/callback definition (e.g. the actual logic of sending email).

Therefore, there is no need to store the function definition in the database. Here's the exact section from the article that describes this: https://dropbox.tech/infrastructure/asynchronous-task-scheduling-at-dropbox#ownership-model

Ownership model
ATF is designed to be a self-serve framework for developers at Dropbox. The design is very intentional in driving an ownership model where lambda owners own all aspects of their lambdas’ operations. To promote this, all lambda worker clusters are owned by the lambda owners. They have full control over operations on these clusters, including code deployments and capacity management. Each executor process is bound to one lambda. Owners have the option of deploying multiple lambdas on their worker clusters simply by spawning new executor processes on their hosts.

score 1 · Answer 2 · answered Oct 25 '22 at 14:22

1

Let me share with how this is done in case of Hangfire, which is a popular job scheduler in .NET world. I use this as an example, because I have some experience with it and its source code is publicly available on github.

Enqueueing a recurring job

RecurringJob.AddOrUpdate(() => Console.WriteLine("Transparent!"), Cron.Daily);

The RecurringJob class defines several overloads for AddOrUpdate to accept different methodCall parameters:

Expression<Action>: Synchronous code without any parameter
Expression<Action<T>>: Synchronous code with a single parameter
Expression<Func<Task>>: Asynchronous code without any parameter
Expression<Func<T, Task>>: Asynchronous code with a single parameter

The overloads are anticipating not just a delegate (a Func or an Action) rather an Expression, because it allows to Hangfire to retrieve meta information about

the type on which
- the given method should be called
  - with what parameter(s)

Retrieving meta data

There is a class called Job which exposes several FromExpression overloads. All of them are calling this private method which does all the heavy lifting. It retrieves the type, method and argument meta data.

From the above example this FromExpression retrieves the following data:

type: System.Console, mscorlib
method: WriteLine
parameter type: System.String
argument: "Transparent!"

These information will be stored inside the Job's properties: Type, Method and Args.

Serializing meta info

The RecurringJobManager receives this job and passes to a transaction via a RecurringJobEntity wrapper to perform an update if the definition of the job has changed or it was not registered at all.

Inside its GetChangedFields method is where the serialization is done via a JobHelper and a InvocationData classes. Under the hood they are using Newtonsoft's json.net to perform the serialization.

Back to our example, the serialized job (without the cron expression) looks something like this

{
   "t":"System.Console, mscorlib",
   "m":"WriteLine",
   "p":[
      "System.String"
   ],
   "a":[
      "Transparent!"
   ]
}

This is what persisted inside the database and read from it whenever the job needs to be triggered.

answered Oct 25 '22 at 14:22

Peter Csala

17,736
16
35
75

Here your function is simple enough to determine type and its method (I assume type and method are some predefined language constructs). But what happens if my function is complex? Like fetching user's email address from db and then sending it? – Kaushal28 Oct 30 '22 at 05:25
@Kaushal28 If you would have a custom class with a custom method which fetches an e-mail address then sends an e-mail then it would work in the exact same way. In the above sample the mscorlib is the assembly which includes the System class. This can point to your assembly and your custom defined class as well. – Peter Csala Oct 30 '22 at 05:32
So I'll need a custom assembly right? where does it stores it? – Kaushal28 Oct 30 '22 at 05:34
1

So if I understand this correctly, basically it stores the compiled version of the code to be executed somewhere and stores it's location and parameters and other metadata into the database and when it needs to execute it, it'll use this info to access the assembly or compiled code and directly execute it. So there is no need to store the function in the database. – Kaushal28 Oct 30 '22 at 05:37
@Kaushal28 In case of .NET the compiled output of your application is mainly an assembly. Hangfire does not need to store it. The Hangfire server can run inside your application in the background. – Peter Csala Oct 30 '22 at 05:39
@Kaushal28 Yes, Hangfire works more or less like that :) – Peter Csala Oct 30 '22 at 05:40
1

Correct. That works when we want to create task scheduler specific to our app. But when I am creating a general purpose task scheduler, where user can provide arbitrary tasks, that assembly might be a separate component independent from user's code and user will provide it to our framework saying "I want to execute this assembly every week". So we'll have to store it somewhere (like AWS S3) and then fetch it before executing it. – Kaushal28 Oct 30 '22 at 05:44
1

So what you described is similar to Python's celery, where some methods from application code will be registered as celery tasks and we store the metadata in the database. – Kaushal28 Oct 30 '22 at 05:49
1

Anyways, take +1 and bounty for the explanation! But I'll accept my answer as it's specific the question. – Kaushal28 Oct 30 '22 at 05:52
1

@Kaushal28 If you store your assemblies on S3 then it's your application's responsibility to download them and load/register them into your application. – Peter Csala Oct 30 '22 at 05:52
@Kaushal28 I highly appreciate your kindness. – Peter Csala Oct 30 '22 at 05:54

Dropbox's ATF - How functions/callbacks are stored in database?

2 Answers2

Enqueueing a recurring job

Retrieving meta data

Serializing meta info