12

I apologize in advance if the quality of the question is bad. I am still beginning to learn the concepts of REST API. I am trying to implement a scalable REST API for data processing. Here is what I could think of so far.

Consider some numerical data that can be retrieved using a GET call:

GET http://my.api/data/123/

Users can apply a sequence of arithmetic operations such as add and multiply. A non-RESTful way to do that is:

GET http://my.api/data/123?add=10&multiply=5

Assupmtions:

  • The original data in the DB is not changed. Only an altered version of it is returned to the user.
  • The data is large in size (say a large multi-dimensional array), so we can't afford to return the whole data with every opertation call. Instead, we want to apply operations as a batch and return the final modified data in the end.

There are 2 RESTful ways I am currently conisdering:

1. Model arithmetic operations as subresources of data.

If we consider add and multiply as subresources of data as here. In this case, we can use:

GET http://my.api/data/123/add/10/

which would be safe and idempotent, given that the original data is never changed. However, we need to chain multiple operations. Can we do that?

GET http://my.api/data/123/add/10/multiply/5/

Where multiply is creating a subresource of add/10/ which itself is a subresource of data/123

Pros:

  • Statelessness: The sever doesn't keep any information about the modified data.
  • Easy access to modified data: It is just a simple GET call.

Cons:

  • Chaining: I don't know if it can be easily implemented.
  • Long URIs: with each operation applied, the URI gets longer and longer.

2. Create an editable data object:

In this case, a user creates an editable version of the original data:

POST http://my.api/data/123/

will return

201 Created
Location: http://my.api/data/123/edit/{uniqueid}

Users can then PATCH this editable data

PATCH http://my.api/data/123/edit/{uniqueid}
{add:10, multiply:5}

And finally, GET the edited data

GET http://my.api/data/123/edit/{uniqueid}

Pros:

  • Clean URIs.

Cons:

  • The server has to save the state of edited data.
  • Editing is no long idempotent.
  • Getting edited data requires users to make at least 3 calls.

Is there a cleaner, more semantic way to implement data processing RESTfully?

Edit:

If you are wondering what is the real world problem behind this, I am dealing with digital signal processing.

As a simple example, you can think of applying visual filters to images. Following this example, a RESTful web service can do:

GET http://my.api/image/123/blur/5px/rotate/90deg/?size=small&format=png
Community
  • 1
  • 1
ahmohamed
  • 2,920
  • 20
  • 35
  • 2
    Why is "GET http://my.api/data/123?add=10&multiply=5" non-RESTful?? This assumes no state – Lucas Crawford Oct 14 '15 at 04:10
  • Well, it depends on how you interpret it. I see that `add` and `multiply` modify the resource, rather that ***scoop*** it. I think query string should be reserved for scooping as in `GET http://my.api/data/123/add/10/?subset_from=0&subset_to=10` for example. – ahmohamed Oct 14 '15 at 04:17
  • @ahmohamed, it this a real problem? – Opal Oct 14 '15 at 12:01
  • @Opal Of course. My application involves signal processing. I have edited the question to demonstrate that. – ahmohamed Oct 14 '15 at 20:45
  • Why you don't want/can't keep the modified data? – Opal Oct 16 '15 at 07:33
  • I already wrote it in the *Cons* of the second method. Basically, I lose the statelessness and scalability. If 100 people accessed the data, I get 100 copies saved. – ahmohamed Oct 16 '15 at 07:38
  • have you considered this http://stackoverflow.com/a/8275397/3219121 ? – matagus Oct 16 '15 at 10:22
  • @matagus Thanks a lot for the link. It is indeed a very good discussion of the first method (operations as subresources). However, it doesn't discuss chaining (applying operations after one another). – ahmohamed Oct 16 '15 at 10:34

4 Answers4

4

A couple of things worth reviewing in your question.

REST based API’s are resource based

So looking at your first example, trying to chain transformation properties into the URL path following a resource identifier..

GET http://my.api/data/123/add/10/multiply/5/

..does not fit well (as well as being complicated to implement dynamically, as you already guessed)

Statelessness

The idea of statelessness in REST is built around a single HTTP call containing enough information to process the request and provide a result without going back to the client for more information. Storing the result of an HTTP call on the server is not state, it’s cache.


Now, given that a REST based API is probably not the best fit for your usage, if you do still want to use it here are your options:

1. Use the Querystring with a common URL operation

You could use the Querystring but simplify the resource path to accept all transformations upon a single URI. Given your examples and reluctance to store transformed results this is probably your best option.

GET http://my.api/data/123/transform?add=10&multiply=5

2. Use POST non-RESTfully

You could use POST requests, and leverage the HTTP body to send in the transformation parameters. This will ensure that you don’t ever run out of space on the query string if you ever decide to do a lot of processing and it will also keep your communication tidier. This isn’t considered RESTful if the POST returns the image data.

3. Use POST RESTfully

Finally, if you decide that you do want to cache things, your POST can in fact store the transformed object (note that REST doesn’t dictate how this is stored, in memory or DB etc.) which can be re-fetched by Id using a GET.

Option A

POSTing to the URI creates a subordinate resource.

POST http://my.api/data/123
{add:10, multiply:5}

returns

201 Created
Location: http://my.api/data/123/edit/{uniqueid}

then GET the edited data

GET http://my.api/data/123/edit/{uniqueid}

Option B

Remove the resource identifier from the URL to make it clear that you're creating a new item, not changing the existing one. The resulting URL is also at the same level as the original one since it's assumed it's the same type of result.

POST http://my.api/data
{original: 123, add:10, multiply:5}

returns

201 Created
Location: http://my.api/data/{uniqueid}

then GET the edited data

GET http://my.api/data/{uniqueid}
Oliver Gray
  • 874
  • 6
  • 17
  • Thanks a lot for your answer. I am currently leaning towards RESTful POST Option A `http://my.api/data/123/edit/{uniqueid}`, since it is more semantic in indicating that the response is an **edited** copy of the data. The data can be stored to the server in a short-lived cache for scalability – ahmohamed Oct 22 '15 at 12:31
  • Since the the purpose of the question was to stir discussion on the subject, I will accept you answer because you discussed several options. – ahmohamed Oct 22 '15 at 12:33
4

There are multiple ways this can be done. In the end it should be clean, regardless of what label you want to give it (REST non-REST). REST is not a protocol with an RFC, so don't worry too much about whteher you pass information as URL paths or URL params. The underlying webservice should be able to get you the data regarless of how it is passed. For example Java Jersey will give you your params no matter if they are param or URL path, its just an annotation difference.

Going back to your specific problem I think that the resource in this REST type call is not so much the data that is being used to do the numerical operations on but the actual response. In that case, a POST where the data ID and the operations are fields might suffice.

POST http://my.api/operations/

{
    "dataId": "123",
    "operations": [
        {
            "type": "add",
            "value": 10
        },
        {
            "type": "multiply",
            "value": 5
        }
    ]
}

The response would have to point to the location of where the result can be retrieved, as you have pointed out. The result, referenced by the location (and ID) in the response, is essentially an immutable object. So that is in fact the resource being created by the POST, not the data used to calculate that result. Its just a different way of viewing it.

EDIT: In response to your comment about not wanting to store the outcome of the operations, then you can use a callback to transmit the results of the operation to the caller. You can easily add the a field in the JSON input for the host or URL of the callback. If the callback URL is present, then you can POST to that URL with the results of the operation.

{
    "dataId": "123",
    "operations": [
        {
            "type": "add",
            "value": 10
        },
        {
            "type": "multiply",
            "value": 5
        }
    ],
    "callBack": "<HOST or URL>"
}
Jose Martinez
  • 11,452
  • 7
  • 53
  • 68
  • This is definitely the way I'd go. In this way, you could do `POST http://my.api/image/123/filters {"blur":"5px", "rotate":"90deg", "size":"small", "format":"png"}` and have everything you need. – FunkyShu Oct 19 '15 at 18:31
  • Thanks a lot for your answer. Yours is definately one way to go (And I believe there's no one correct answer to the question). But I think that this way not scalable also, since it creates multiple copies of the data with each time. – ahmohamed Oct 22 '15 at 12:25
  • Was the data intended to be altered? I thought that the operation was intended to not alter the original data and that client just wants the output of the operation. – Jose Martinez Oct 22 '15 at 12:39
  • I think your understanding is correct. The original data resource is not altered. I mean by 'creating copies' that a new resource (with the altered data) is created with each request. Since the data size (and hence the altered data also) is big, this presents a scalability issue. – ahmohamed Oct 23 '15 at 01:55
  • 3 points to consider. First, whether you use POST, GET, JSON, or params, this should not affect the underlying implementation. If the implementation creates extra objects that need to be kept, then change it. Second point is your requirement `The data is large in size (say a large multi-dimensional array), so we can't afford to return the whole data with every opertation call. Instead, we want to apply operations as a batch and return the final modified data in the end.`. Third point, if you do not want to store the object after it is calculated, then you can use a call back. – Jose Martinez Oct 23 '15 at 03:21
2

Please don't view this as me answering my own question, but rather as a constribution to the discussion.

I have given a lot of thought into this. The main problem with the currently suggested architectures is scalability, since the server creates copies of data each time it is operated on.

The only way to avoid this is to model operations and data separately. So, similar to Jose's answer, we create a resource:

POST http://my.api/operations/
{add:10, multiply:5}

Note here, I didn't specify the data at all. The created resource represents a series of operations only. The POST returns:

201 Created
Location: http://my.api/operations/{uniqueid}

The next step is to apply the operations on the data:

GET http://my.api/data/123/operations/{uniqueid}

This seprate modeling approach have several advantages:

  1. Data is not replicated each time applies a different set of operations.
  2. Users create only operations resources, and since their size is tiny, we don't have to worry about scalability.
  3. Users create a new resource only when they need a new set of operations.Going to the image example: if I am designing a greyscale website, and I want all images to be converted to greyscale, I can do

    POST http://my.api/operations/
    {greyscale: "50%"}
    

    And then apply this operation on all my images by:

    GET http://my.api/image/{image_id}/operations/{geyscale_id}
    

    As long as I don't want to change the operation set, I can use GET only.

  4. Common operations can be created and stored on the server, so users don't have to create them. For example:

    GET http://my.api/image/{image_id}/operations/flip
    

    Where operations/flip is already an available operation set.

  5. Easily, applying the same set of operations to different data, and vice versa.

    GET http://my.api/data/{id1},{id2}/operations/{some_operation}
    

    Enables you to compare two datasets that are processed similarly. Alternatively:

    GET http://my.api/data/{id1}/operations/{some_operation},{another_operation}
    

    Allows you to see how different processing procedures affects the result.

ahmohamed
  • 2,920
  • 20
  • 35
0

I wouldn't try to describe your math function using the URI or request body. We have a more or less standard language to describe math, so you could use some kind of template.

GET http://my.api/data/123?transform="5*(data+10)"
POST http://my.api/data/123 {"transform": "5*({data}+10)"}

You need a code on client side, which can build these kind of templates and another code in the server side, which can verify, parse, etc... the templates built by the client.

inf3rno
  • 24,976
  • 11
  • 115
  • 197
  • The math operations here are just an example for data processing. I don't think we have a standard language for signal (or image) processing in the same way you described. – ahmohamed Oct 22 '15 at 12:27