0

I have a model in django called "Sample" I want to query and return a large number of rows ~ 100k based on filters. However, it's taking up to 4-5 seconds to return the response and I was wondering whether I could make it faster.

(Need to improve converting from queryset to df to response json. Not querying from DB)

My current code looks like this:

@api_view(['POST'])
def retrieve_signal_asset_weight_ts_by_signal(request):
    
    #code to get item.id here based on request

    qs = Sample.objects.filter(
        data_date__range=[start_date, end_date],
        item__id = item.id).values(*columns_required)

    df = pd.DataFrame(list(qs), columns=columns_required)    
    response = df .to_json(orient='records')

    return Response(response, status=status.HTTP_200_OK)

Based on multiple test cases -- I've noticed that the slow part isn't actually getting the data from DB, it's converting it to a DataFrame and then returning as JSON. It's actually taking about 2 seconds just for this part df = pd.DataFrame(list(qs), columns=columns_required). Im looking for a faster way to convert queryset to a json which I can send as part of my "response" object!

Based on this link I've tried other methods including django-pandas and using .values_list() but they seem to be slower than this, and I noticed many of the answers are quite old so I was wondering whether Django 3 has anything to make it faster.

Thanks

Django version : 3.2.6

Shubham Periwal
  • 2,198
  • 2
  • 8
  • 26
  • 1
    `item__id` may be inefficient. If Django or the DB doesn't optimize it, it specifies joining two tables. The value of a ForeignKey field is the id of the row in the related table, and IIRC you can use `item_id=item.id` (one underscore, not two). Alternatively if you have an `Item` instance `item`, then `item=item` will do what you want. – nigel222 Sep 24 '21 at 09:58

1 Answers1

2

With your code, you can't write:

(Need to improve converting from queryset to df to response json. Not querying from DB)

It's actually taking about 2 seconds just for this part

df = pd.DataFrame(list(qs), columns=columns_required)

Get data from database is a lazy operation, so the query will be executed only when data is needed list(qs). According to the documentation:

QuerySets are lazy – the act of creating a QuerySet doesn’t involve any database activity. You can stack filters together all day long, and Django won’t actually run the query until the QuerySet is evaluated. Take a look at this example:

Try to separate operation:

records = list(qs)
df = pd.DataFrame(records, columns=columns_required))

Now, you can determine which operation is time-consuming.

Maybe, you look at StreamingHttpResponse

Corralien
  • 109,409
  • 8
  • 28
  • 52