9

Jump to edit to see more real-life code example, that doesn't work after changing the query order

Here are my models:

class ModelA(models.Model):
    field_1a = models.CharField(max_length=32)
    field_2a = models.CharField(max_length=32)


class ModelB(models.Model):
    field_1b = models.CharField(max_length=32)
    field_2b = models.CharField(max_length=32)

Now, create 2 instances each:

ModelA.objects.create(field_1a="1a1", field_2a="1a2")
ModelA.objects.create(field_1a="2a1", field_2a="2a2")
ModelB.objects.create(field_1b="1b1", field_2b="1b2")
ModelB.objects.create(field_1b="2b1", field_2b="2b2")

If I'll query for only one model with annotations, I get something like that:

>>> ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a")).values("field1", "field2")
[{"field1": "1a1", "field2": "1a2"}, {"field1": "2a1", "field2": "2a2"}]

This is correct behavior. The problem starts, when I want to get union of those two models:

# model A first, with annotate
query = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a"))
# now union with model B, also annotated
query = query.union(ModelB.objects.all().annotate(field1=F("field_1b"), field2=F("field_2b")))
# get only field1 and field2
query = query.values("field1", "field2")

# the results are skewed:
assert list(query) == [
    {"field1": 1, "field2": "1a1"},
    {"field1": 1, "field2": "1b1"},
    {"field1": 2, "field2": "2a1"},
    {"field1": 2, "field2": "2b1"},
]

The assert passes correctly, which means that the results are wrong. It seems like the values() didn't match the variable name, it just iterated over the object as on a tuple. The value of field1 is actually the object's ID, and field2 is field1.

This is pretty easy to fix in such simple models, but my real models are quite complex, and they have a different number of fields. How do I union them correctly?

EDIT

Below you can find an extended example that fails regardless of the order of union() and values() - the models are slightly bigger now, and it seems that the different fields count somehow confuses Django:

# models

class ModelA(models.Model):
    field_1a = models.CharField(max_length=32)
    field_1aa = models.CharField(max_length=32, null=True)
    field_1aaa = models.CharField(max_length=32, null=True)
    field_2a = models.CharField(max_length=32)
    extra_a = models.CharField(max_length=32)


class ModelB(models.Model):
    extra = models.CharField(max_length=32)
    field_1b = models.CharField(max_length=32)
    field_2b = models.CharField(max_length=32)
# test

ModelA.objects.create(field_1a="1a1", field_2a="1a2", extra_a="1extra")
    ModelA.objects.create(field_1a="2a1", field_2a="2a2", extra_a="2extra")
    ModelB.objects.create(field_1b="1b1", field_2b="1b2", extra="3extra")
    ModelB.objects.create(field_1b="2b1", field_2b="2b2", extra="4extra")

    values = ("field1", "field2", "extra")

    query = (
        ModelA.objects.all()
        .annotate(
            field1=F("field_1a"), field2=F("field_2a"), extra=F("extra_a")
        )
        .values(*values)
    )
    query = query.union(
        ModelB.objects.all()
        .annotate(field1=F("field_1b"), field2=F("field_2b"))
        .values(*values)
    )
# outcome

assert list(query) == [
        {"field1": "1a1", "field2": "1a2", "extra": "1extra"},
        {"field1": "2a1", "field2": "2a2", "extra": "2extra"},
        {"field1": "3extra", "field2": "1b1", "extra": "1b2"},
        {"field1": "4extra", "field2": "2b1", "extra": "2b2"},
    ]
Djent
  • 2,877
  • 10
  • 41
  • 66

3 Answers3

3

After some debugging and going through the source code, I have an idea why this is happening. What I am going to do is try to explain that why doing annotate + values results in displaying the id and what is the difference between the two cases above.

To keep things simple, I will write also write the possible resulting sql query for each statement.

1. annotate first but get values on union query

qs1 = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a"))

When writing something like this, django will get all the fields + annotated fields, so the resulting sql query looks like:

select id, field_1a, field_2a, field_1a as field1, field_2a as field2 from ModelA

So, if we have a query which is the result of:

qs = qs1.union(qs2)

the resulting sql for django looks like:

(select id, field_1a, field_2a, field_1a as field1, field_2a as field2 from ModelA)
UNION
(select id, field_1b, field_2b, field_1b as field1, field_2b as field2 from ModelB)

Let's go deeper into how this sql is generated. When we do a union, a combinator and combined_queries is set on the qs.query and the resulting sql is generated by combining the sql of individual queries. So, in summary:

qs.sql == qs1.sql UNION qs2.sql # in abstract sense

When, we do qs.values('field1', 'field2'), the col_count in compiler is set to 2 which is the number of fields. As you can see that the union query above returns 5 columns but in the final return from compiler each row in the results is sliced using col_count. Now, this results with only 2 columns is passed back to ValuesIterable where it maps each name in the selected fields with the resulting columns. That is how it leads to the incorrect results.

2. annotate + values on individual queries and then perform union

Now, let's see what happens when annotate is used with values directly

qs1 = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a")).values('field1', 'field2')

The resulting sql is:

select field_1a as field1, field_2a as field2 from ModelA

Now, when we do the union:

qs = qs1.union(qs2)

the sql is:

(select field_1a as field1, field_2a as field2 from ModelA)
UNION
(select field_1b as field1, field_2b as field2 from ModelB)

Now, when qs.values('field1', 'field2') executes, the number of columns returned from union query has 2 columns which is same as the col_count which is 2 and each field is matched with the individual columns producing the expected result.


3. Different field annotation count and ordering of fields

In the OP, there is a scenario when even using .values before union doesn't produce correct results. The reason for that is that in the ModelB, there is no annotation for extra field.

So, let's look at the queries generated for each model:

ModelA.objects.all()
        .annotate(
            field1=F("field_1a"), field2=F("field_2a"), extra=F("extra_a")
        )
        .values(*values)

The SQL becomes:

select field_1a as field1, field_2a as field2, extra_a as extra from ModelA

For ModelB:

ModelB.objects.all()
        .annotate(field1=F("field_1b"), field2=F("field_2b"))
        .values(*values)

SQL:

select extra, field_1b as field1, field_2b as field2 from ModelB

and the union is:

(select field_1a as field1, field_2a as field2, extra_a as extra from ModelA)
UNION
(select extra, field_1b as field1, field_2b as field2 from ModelB)

Because annotated fields are listed after the real db fields, the extra of ModelB is mixed with field1 of ModelB. TO make sure that you get correct results, please make sure that the ordering of fields in generated SQL is always correct - with or without annotation. In this case, I will suggest to annotate extra on ModelB as well.

AKS
  • 18,983
  • 3
  • 43
  • 54
  • Thanks for a detailed answer, but did you see the code after edit with slightly more complex models? It doesn't work if you switch the query order, and the example is closer to the real one. – Djent Mar 17 '20 at 06:10
  • @Djent I have updated the answer. Please check the 3rd section which explains the behavior you see with the complex model. – AKS Mar 17 '20 at 06:25
  • this looks like a bug in Django to me. I cannot annotate `extra` field as it is already in the model, and Django will reject that. Unfortunately, in my real-life example, I have many fields that have to be annotated in one model but are present in another. Guess the only thing I can do here is to leave the ORM and build the SQL query manually... – Djent Mar 17 '20 at 07:29
  • It indeed seems like a bug. What I meant by annotating `extra` field is that, you don't use the existing field names at all. You annotate all the fields with some new name, irrespective of whether they are present in a model or not. e.g. on `ModelA`, annotate the fields as `f1`, `f2` and `ex` and similarly do it in `ModelB`. That way you can force django to always use the annotated name because it won't conflict with the existing name in the db and it will also guarantee that the results appear in the order you want.. I think doing this will be less painful than writing SQL. – AKS Mar 17 '20 at 07:53
  • Right, that makes sense. Thanks for the help! – Djent Mar 17 '20 at 08:03
  • @Djent Glad that I was able to help. Just out of curiosity, what happened to the bounty here? – AKS Mar 17 '20 at 15:17
  • unfortunately it expired and the points are lost, sorry about that. – Djent Mar 18 '20 at 07:20
0

I dug through the docs and must admit I did not fully understand why your approach did not work (according to my understanding it should). What I believe is that applying union to querysets with different field names seems to be producing strange effects.

Anyway, applying values before doing the union seems to produce correct results:

query = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a")).values('field1', 'field2')
query = query.union(ModelB.objects.all().annotate(field1=F("field_1b"), field2=F("field_2b")).values('field1', 'field2'))

Which results in this queryset

[
    {'field1': '1a1', 'field2': '1a2'}, 
    {'field1': '1b1', 'field2': '1b2'}, 
    {'field1': '2a1', 'field2': '2a2'}, 
    {'field1': '2b1', 'field2': '2b2'}
]
Chris
  • 2,162
  • 1
  • 6
  • 17
  • Right, it works for this example. But in my real-life one, which is more complicated, it still confuses the field, no matter if I'll apply `values()` before or after `union()`. When I won't use union, then values are correct for separate models. – Djent Mar 09 '20 at 06:09
-1

One answer I ask ChatGPT is: In this answer, just like above, we need to annotate the whole queryset,1 and change the value in union operation every time we want.

To make the story short, you must union separated querysets (the querysets, should not have anything in common!!).

Note: all annotation operation, must be before union in DJANGO

reason why this is happening (ChatGPT): When you perform a union operation in Django, the resulting queryset will have the annotations from the first queryset. In order to annotate different values for each union operation, you need to perform the annotations on the individual querysets before performing the union.



EXAMPLE WRONG:
the union is along with the annotation in each iteration.

    q = Text.objects.none()


    for page in range(start_page, end_page):
        sura_aya = json.loads(page_data[f"{page}"])
        sura_aya_next = json.loads(page_data[f"{page + 1}"])
        q = q | Text.objects.filter(sura=sura_aya[0]).annotate(page=Value(page))


EXAMPLE CORRECT:
the union is completely after the annotation operation.


querysets_to_union = []
    for page in range(start_page, end_page):
        sura_aya = json.loads(page_data[f"{page}"])
        sura_aya_next = json.loads(page_data[f"{page + 1}"])
        qs = Text.objects.filter(sura=sura_aya[0], aya__gte=sura_aya[1], aya__lte=sura_aya_next[1])
        qs = qs.annotate(page=Value(page))
        querysets_to_union.append(qs)

for qs in querysets_to_union[1:]:
    final_qs = final_qs.union(qs)



enter image description here

zamoosh
  • 37
  • 4