1

again I would like to search for duplicates in my models, but now slightly different case.

Here are my models:

class Concept(models.Model):
    main_name = models.ForeignKey(Literal)
    ...
class Literal(models.Model):
    name = models.Charfield(...)
    concept = models.ForeignKey(Concept)
    ...

And now the task I'm trying to achieve: Select all literals that are NOT main_names, that have the same name for the same concept.

For example if I have literals:

[{id:1, name:'test', concept:1}, {id:2, name:'test', concept:1}]

and concepts:

[{id:1, main_name:1}]

Then in result I should get literal with the ID=2.

Community
  • 1
  • 1
dragoon
  • 5,601
  • 5
  • 37
  • 55

2 Answers2

1

If I understand your question you want:

  1. All Literal objects that are not ForeignKey'd to Concept.
  2. From that set, select those where the name and the concept is the same.

If so, I think this should work:

For the first part:

q = Literal.objects.exclude(pk__in=Concept.objects.values_list('id', flat=True))

EDIT:

Based on excellent feedback from Jan, I think for #2 you would need to use raw SQL.

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • 1
    It should be `values_list` and I think your second part doesn't do anything. Do you mean `q.filter(name=F('concept__main_name__name'))`? I'm not sure if this is optimal in performance though. – Jan Pöschko Feb 12 '12 at 17:45
  • I'm not sure about the performance part to be honest; but I think the F expression is correct as per my understanding of the docs. Thanks for the `values_list`, edited. – Burhan Khalid Feb 12 '12 at 17:52
  • Does it not just test whether the literal's name equals its own name (same for concept), which is of course true for all literals? – Jan Pöschko Feb 12 '12 at 17:54
  • OK. I posted a possible solution in my comment and think that it *does* actually work, without using raw SQL but foreign-key lookups. – Jan Pöschko Feb 12 '12 at 18:11
  • Yep, I know have to select all not main literals, this is fairly simple, my insterest was how to perform the whole task. Thanks for the help anyway. – dragoon Feb 12 '12 at 18:13
  • @JanPöschko, in the query in your comment you will get dulicates throughout the whole table, but I need only duplicates for the same concept. – dragoon Feb 12 '12 at 18:17
1

It sounds to me as though you want to execute a SQL query something like this:

SELECT l1.* FROM myapp_literal AS l1,
                 myapp_literal AS l2
WHERE l1.id <> l2.id
  AND l1.name = l2.name
  AND l1.concept = l2.concept
  AND l1.id NOT IN (SELECT main_name FROM myapp_concept)
GROUP BY l1.id

Well, in cases where the query is too complex to easily express in Django's query language, you can always ask Django to do a raw SQL query—and this may be one of those cases.

Gareth Rees
  • 64,967
  • 9
  • 133
  • 163
  • 1
    Yes, I think I gives exactly what I want, except that you need to specify l1.id instead of l1.*, otherwise it will be `column must appear in the GROUP BY clause or be used in an aggregate function` – dragoon Feb 12 '12 at 19:12
  • could you also tell me what should I write do delete everything selected from l1? – dragoon Feb 12 '12 at 19:18
  • I guess I'm too used to MySQL, which [allows non-aggregated columns in the `SELECT` part of `GROUP BY` queries](http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html). You're quite right to point out that in most databases this won't be allowed. – Gareth Rees Feb 12 '12 at 19:36
  • To delete the records matching a query, you can [use Django's `delete` method](https://docs.djangoproject.com/en/1.3/topics/db/queries/#deleting-objects) on the query. Something like `Literal.objects.raw('SELECT ...').delete()` But take care—the query I wrote selects *all* records with duplicates—you probably want to delete *all but one* of each matching set. – Gareth Rees Feb 12 '12 at 19:40