-1

I have 3 tables: contact, mailing_list and mailing_list_contact. Each contact can be subscribed to multiple mailing lists, recorded in mailing_list_contact.

To fetch the total contact count subscribed to a mailing list i use this query:

return $this->createQueryBuilder('contact')
->select('count(distinct contact.id)')
->leftJoin('contact.mailingListContacts', 'mlc')
->leftJoin('mlc.mailingList', 'ml')
->andWhere('ml IN (:mailingList)')
->setParameter('mailingList', $mailingList)
->getQuery()
->getSingleScalarResult();

This results in 7620 contacts and is correct.

When doing a NOT IN query, i get unexpected results and can not get my head around it. This is my NOT IN query which should return the contacts not subscribed to a single mailing list:

return $this->createQueryBuilder('contact')
->select('count(distinct contact.id)')
->leftJoin('contact.mailingListContacts', 'mlc')
->leftJoin('mlc.mailingList', 'ml')
->andWhere('ml NOT IN (:mailingList) or mlc.mailingList is null')
->setParameter('mailingList', $mailingList)
->getQuery()
->getSingleScalarResult();

This produces a contact count of 9064. The total contact count is 9109. I am expecting that this query would return 1489 contacts. What am i doing wrong?

Will B.
  • 17,883
  • 4
  • 67
  • 69
Anthony G
  • 9
  • 3

1 Answers1

0

The query criteria ml NOT IN (:mailingList) or mlc.mailingList is null is retrieving the number of the contacts that are subscribed to any mailing list except those specified as well as those that are not subscribed to any mailing list.

Dataset Criteria

The issue appears to be a misunderstanding of how SQL limits datasets based on the specified criteria(s).

Using JOIN in conjunction with a WHERE criteria on the JOIN table to limit the dataset, the database will first retrieve all matching records within the JOIN table prior to filtering them within the context of the SELECT table, which can lead to unexpected results. Additionally causing decreased performance when a significant amount of records are in the JOIN table.

Instead, the datasets can be limited for the SELECT and JOIN tables individually by specifying a criteria for each. Appending the desired criteria to the JOIN ON clause using AND prevents unexpected results and improves performance by limiting the dataset of the join table, while also utilizing INNER JOIN when appropriate to further limit the dataset of the select table.

Use INNER JOIN to retrieve only the records that match.
Use LEFT JOIN to retrieve records regardless of if they match or not.

SELECT s.id 
FROM s 
INNER JOIN j 
ON j.s_id = s.id 
AND j.id = 1

Doctrine DQL Functionality

The Doctrine ORM QueryBuilder produces DQL.

In DQL, add WITH to append the AND portion of the criteria to the automatically referenced JOIN ON clause.

/* ORM QueryBuilder */
->leftJoin('Entity.association', 'a', 'WITH', 'a.property IN(:param)')

/* DQL */
LEFT JOIN Entity.association a WITH a.property IN(:param)

/* Resulting SQL */
LEFT JOIN association_table AS a
ON a.entity_id = entity.id
AND a.property IN(:param)

Additionally using IDENTITY() retrieves the associated Identity column without needing an additional JOIN to establish the relationship.

/* ORM QueryBuilder */
->leftJoin('Entity.association', 'a', 'WITH', 'IDENTITY(a.association) IN(:param)')

/* DQL */
LEFT JOIN Entity.association a WITH IDENTITY(a.association) IN(:param)

/* Resulting SQL */
LEFT JOIN association_table AS a
ON a.entity_id = entity.id
AND a.association_id IN(:param)

Count Contacts subscribed to the specified mailing lists

DB-Fiddle - Query #4

To determine the number of contacts subscribed to the specified mailing lists, limit the dataset of the contacts by using INNER JOIN and move the WHERE criteria to the ON clause.

$this->createQueryBuilder('contact')
    ->select('count(distinct contact.id)')
    /* retrieve only contact subscriptions with the specified mailing lists */
    ->innerJoin('contact.mailingListContacts', 'mlc', 'WITH', 'IDENTITY(mlc.mailingList) IN(:mailingList)')
    ->setParameter('mailingList', $mailingList);

/* Resulting SQL */
SELECT COUNT(DISTINCT contact.id)
FROM contact
INNER JOIN mailingListContacts AS mlc 
ON mlc.contact_id = contact.id
AND mlc.mailing_list_id IN(:mailingList);

Count Contacts without Subscriptions

DB-Fiddle - Query #5

To determine the number of contacts that are not subscribed to any mailing list, an exclusion join can be used as LEFT JOIN AS j WHERE j.s_id IS NULL to retrieve the records that do not exist in the join table.

$this->createQueryBuilder('contact')
    ->select('count(distinct contact.id)')
    /* retrieve all contact subscriptions */
    ->leftJoin('contact.mailingListContacts', 'mlc')
    /* limit to only missing contacts */
    ->andWhere('mlc.contact is null');

/* Resulting SQL */
SELECT COUNT(DISTINCT contact.id)
FROM contact
LEFT JOIN mailingListContacts AS mlc 
ON mlc.contact_id = contact.id
WHERE mlc.contact_id IS NULL;

Count Contacts not subscribed to the specified mailing lists

DB-Fiddle - Query #6

To determine the number of contacts that are not subscribed to specific mailing lists, expand the above exclusion join query by appending the criteria to the JOIN ON clause to limit the dataset of the join table to only the desired mailing lists to validate.

$this->createQueryBuilder('contact')
    ->select('count(distinct contact.id)')
    /* retrieve all contact subscriptions limited to only the specified mailing lists */
    ->leftJoin('contact.mailingListContacts', 'mlc', 'WITH', 'IDENTITY(mlc.mailingList) IN(:mailingLists)')
    /* limit to only missing contacts */
    ->andWhere('mlc.contact is null');

/* Resulting SQL */
SELECT COUNT(DISTINCT contact.id)
FROM contact
LEFT JOIN mailingListContacts AS mlc 
ON mlc.contact_id = contact.id
AND mlc.mailing_list_id IN(:mailingList)
WHERE mlc.contact_id IS NULL;

Count Contacts with subscriptions except the specified mailing lists

DB-Fiddle - Query #7

To determine the number of contacts with subscriptions excluding the specified mailing lists, change the LEFT JOIN to an INNER JOIN, as well as IN() to NOT IN(), and remove the WHERE clause.

$this->createQueryBuilder('contact')
    ->select('count(distinct contact.id)')
    /* retrieve all contact subscriptions not subscribed to the specified mailing lists */
    ->innerJoin('contact.mailingListContacts', 'mlc', 'WITH', 'IDENTITY(mlc.mailingList) NOT IN(:mailingLists)');

/* Resulting SQL */
SELECT COUNT(DISTINCT contact.id)
FROM contact
INNER JOIN mailingListContacts AS mlc 
ON mlc.contact_id = contact.id
AND mlc.mailing_list_id NOT IN(:mailingList);
Will B.
  • 17,883
  • 4
  • 67
  • 69