Optimising API queries using JSONField()

Question

Initial opening: I am utilising postgresql JSONFields.

I have the following attribute (field) in my User model:

class User(AbstractUser):
   ...
   benefits = JSONField(default=dict())
   ...

I essentially currently serialize benefits for each User on the front end with DRF:

    benefits = UserBenefit.objects.filter(user=self)
    serializer = UserBenefitSerializer(benefits, many=True)

As the underlying returned benefits changes little and slowly, I thought about "caching" the JSON in the database every time there is a change to improve the performance of the UserBenefit.objects.filter(user=user) QuerySet. Instead, becoming user.benefits and hopefully lightening DB load over 100K+ users.

1st Q:

Should I do this?

2nd Q:

Is there an efficient way to write the corresponding serializer.data <class 'rest_framework.utils.serializer_helpers.ReturnList'> to the JSON field?

I am currently using:

data = serializers.serialize("json", UserBenefit.objects.filter(user=self))

By adding this field you're basically eliminating queries, backend logic and json serialization. Json serialization is almost always quick, so if you're expecting issues with current approach you need to check your SQL queries (pay attention to indices, FKs, reverse FKs, M2M, binary fields and large text fields) and backend logic. How many `UserBenefit` objects do you have per user? — awesoon, Aug 16 '19 at 10:38
So my current issue is that the DB layer is being hit by maybe 4/5 API requests at any one time, and for some reason the `UserBenefits` ViewSet is running quite slowly to the rest. Ultimately, a lot of the queries will be stored in this way... — Micheal J. Roberts, Aug 16 '19 at 10:41
It is hard to say what is the right approach without knowing actual queries, their timings (better with `EXPLAIN ANALYZE`) and actual backend logic (are there any calculable fields that are issuing other db queries?) — awesoon, Aug 16 '19 at 10:51
Ok, if we skip over that for the time being, I could test my hypothesis if I knew how to handle the 2nd Q. I'm currently seeing a lot of garbled \stuff\ in the JSON returned by the API...I need to somehow convert `benefits` to json ... — Micheal J. Roberts, Aug 16 '19 at 10:54
"if we skip over that for the time being" - I'm afraid we cannot. You're trying to optimize some part of application and don't know what exactly causing the issues. This is wrong. You need to know what exactly you're optimizing and why. Adding caching everywhere does not work because at some point you will realize that you need to update cached values, transform json structure, or add filtering. Caching is almost always a last resort, not a solution, because you're basically adding data duplication and need to sync it yourself. — awesoon, Aug 16 '19 at 11:09
This is an optimisation. There is no issue per se. This is to improve on what we have already got. Also, the previous comment of "just use" does not work. `Got AttributeError when attempting to get a value for field `benefit` on serializer `OrganisationBenefitSerializer`. The serializer field might be named incorrectly and not match any attribute or key on the `UUID` instance. Original exception text was: 'UUID' object has no attribute 'benefit'.` — Micheal J. Roberts, Aug 16 '19 at 11:11
This exception means that you're passing something wrong to the serializer. Put a breakpoint and inspect your actual data. — awesoon, Aug 16 '19 at 11:15
Please specify `benefits = JSONField(default=dict)`, *not* `benefits = JSONField(default=dict())`. By using `{}`, the defaults will all refer to the same (mutable) dictionary, and modifications in the first one, will have impact on the defaults of the others. — Willem Van Onsem, Aug 16 '19 at 11:47

score 1 · Accepted Answer · answered Aug 16 '19 at 10:57

1

For your first question:

It's not a bad idea if you don't want to use caching alternatives.

If you have to query the database because of some changes or ... and you can't cache the hole request, then the idea of saving a JSON object can be a pretty good idea. This way you only retrieve the data and skip most parts of serializing and also terminate the need to query a pivot table to get the m2m data. But also note that this way, you are adding a whole bunch of extra data to your rows and unless you're going to need them most of the time, and you will get extra data that you don't really need which you can help it using values function on querysets but still it requires more coding. Basically, you're going to use more bandwidth for your first query and more storage to store the data instead of process power. Also, the pagination will be really hard to achieve on your benefits if you need it at some point.

Getting m2m relation data is usually pretty fast depending on the amount of data you have on your database but the ultimate way of getting better performance is caching the requests and reducing the database hits as much as possible.

And as you probably hear it a lot, you should test and benchmark to see which options really works for you the best depending on your requirements and limitations. It's really hard to suggest an optimization method without knowing the information about the whole scope and the current solution.

And for your second question:

I think I don't really get it. If you are storing a JSON object which is a field in User model, then why do you need data = serializers.serialize("json", UserBenefit.objects.filter(user=self)) ?

You don't need it since the serializer can just return the JSON field data.

answered Aug 16 '19 at 10:57

Navid Zarepak

4,148
1
12
26

To answer your question on why I am using serializers, I'm using that to write the data to the JSONField ... ideally I would like to know how to write a QuerySet to the JSONField... – Micheal J. Roberts Aug 16 '19 at 11:00
oh, ok I get it now. You can write a DRF model serializer for your benefit model and just use it to get the serialized data and then store it. – Navid Zarepak Aug 16 '19 at 11:04
Hmmmm, I see: `Object of type UUID is not JSON serializable` – Micheal J. Roberts Aug 16 '19 at 11:12
If it helps, the serializer.data I am trying to store is an OrderedDict... – Micheal J. Roberts Aug 16 '19 at 11:19
@WindUpLordVexxos built-in json module does not know how to serialize uuids. You can use `JSONEncoder` from drf to resolve this error. – awesoon Aug 16 '19 at 11:23
you can override the field serializer. – Navid Zarepak Aug 16 '19 at 11:26
@awesoon How do I do that? – Micheal J. Roberts Aug 16 '19 at 11:30
@NavidZarepak How do I do that? – Micheal J. Roberts Aug 16 '19 at 11:31
Effectively, how do I pass `benefits` to `JSONEncoder`? Could I just import JSONEncoder and then use: `json.dumps(benefits, cls=JSONEncoder)` – Micheal J. Roberts Aug 16 '19 at 11:34
@WindUpLordVexxos https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#django.contrib.postgres.fields.JSONField and https://docs.python.org/3/library/json.html#module-json. And given that you don't know how to use custom json encoder I **strongly** suggest you to find the actual performance issue firstly before adding any caching field. I feel like you don't understand the issue you're trying to resolve. – awesoon Aug 16 '19 at 11:41
user `SerializerMethodField` to create a field or override one and then create a function `get_yourfieldname(self, obj)` and then return the serialized value. you just return the UUID string. https://www.django-rest-framework.org/api-guide/fields/#serializermethodfield – Navid Zarepak Aug 16 '19 at 11:42
@NavidZarepak there is nothing wrong with using uuid object if a json encoder knows how to work with them – awesoon Aug 16 '19 at 11:42
Performance has improved locally, will test in a staging environment and get back to you. – Micheal J. Roberts Aug 16 '19 at 11:50

Optimising API queries using JSONField()

1 Answers1