I want to show distinct cities of Users in the front end dropdown. For that, i make a db query which fetches distinct city_name
from table City
but only those cities where users are present.
Something like below works for a small size of User
table, but takes a very long time if User
table in of size 10 million. Distinct cities of these users are still ~100 though.
class City(models.Model):
city_code = models.IntegerField(unique=True)
city_name = models.CharField(max_length=256)
class User(models.Model):
city = models.ForeignKey('City', to_field='city_code')
Now i try to search distinct city names as:
City.objects.filter().values_list('city__city_name').distinct()
which translates to this on PostgreSQL:
SELECT DISTINCT "city"."city_name"
FROM "user"
LEFT OUTER JOIN "city"
ON ("user"."city_id" = "city"."city_code");
Time: 9760.302 ms
That clearly showed that PostgreSQL was not making use of index on 'user'.'city_id'. I also read about a workaround solution here which involved writing a custom SQL query which somehow utilizes index.
I tried to find distinct 'user'.'city_id' using the above query, and that actually turned out to be pretty fast.
WITH
RECURSIVE t(n) AS
(SELECT min(city_id)
FROM user
UNION
SELECT
(SELECT city_id
FROM user
WHERE city_id > n order by city_id limit 1)
FROM t
WHERE n is not null)
SELECT n
FROM t;
Time: 79.056 ms
But now i am finding it hard to incorporate this in my Django code. I still think it is a kind of hack adding custom query in the code for this. But a bigger concern for me is that the column name can be totally dynamic, and i can not hardcode these column names (eg. city_id, etc.) in the code.
#original_fields could be a list from input, like ['area_code__district_code__name']
dataset_klass.objects.filter().values_list(*original_fields).distinct()
Using the custom query would need atleast splitting the field name with '__' as delimiter and process the first part. But it looks like a bad hack to me.
How can i improve this?
PS. The City
User
example is just shown to explain the scenario. The syntax might not be correct.