1

So what I'm trying to do here is get a report on how many emails (with a MailChimp like app) were sent by different users, but I want two different metrics in one query. I want to know how many individual emails were sent by each user. Meaning if they sent 3 emails to 100 contacts each, that would display 300. But I also want to know how many unique emails were sent, meaning that would display 3.

I'd like to get something that looks like:

-------------------------------------------------------------
| Full Name   | Username        | Total Sent | Unique Mails |
|-------------|-----------------|------------|--------------|
| John Doe    | jdoe@mail.com   | 12000      | 4            |
| James Smith | jsmith@mail.com | 6000       | 12           |
| Jane Jones  | jjones@mail.com | 4000       | 2            |
| ...         | ...             | ...        | ...          |
-------------------------------------------------------------

So I could know that John sends a few emails to a lot of contacts while James sends more emails to fewer contacts.

Here's what my query looks like. I've changed the table and column names, but this is otherwise an exact representation of what it is.

SELECT
    CONCAT(Usernames.FirstName, ' ', Usernames.LastName) AS 'Full Name',
    Usernames.Username,
    COUNT(Sent_Mail_Contacts.IDContact) AS `Total Sent`,
    COUNT(Mass_Mail.IDMass_Mail) AS `Individual E-Mails`
FROM Usernames
LEFT JOIN Sent_Mail_Contacts ON Usernames.Username = Sent_Mail_Contacts.Username
LEFT JOIN Mass_Mail ON Usernames.Username = Mass_Mail.Username
GROUP BY Usernames.Username
ORDER BY `Total Sent`

I have a table with Usernames, a table with individual contacts reached by which emails and a table with unique emails.

So does my query make sense or not? Is this even possible? Because right now when I run it, it gives me something like this:

-------------------------------------------------------------
| Full Name   | Username        | Total Sent | Unique Mails |
|-------------|-----------------|------------|--------------|
| John Doe    | jdoe@mail.com   | 12000      | 12000        |
| James Smith | jsmith@mail.com | 6000       | 6000         |
| Jane Jones  | jjones@mail.com | 4000       | 4000         |
| ...         | ...             | ...        | ...          |
-------------------------------------------------------------

I just gives me the same number in both columns and takes 7 minutes to process.

Here is an example of what the 3 tables would look like separately if that can help:

Usernames
------------------------------------------------
| Username        | FirstName | LastName | ... |
|-----------------|-----------|----------|-----|
| jdoe@mail.com   | John      | Doe      | ... |
| jsmith@mail.com | James     | Smith    | ... |
| jjones@mail.com | Jane      | Jones    | ... |
| ...             | ...       | ...      | ... |
------------------------------------------------

Mass_Mail
----------------------------------------------------
| ID_Mass_Mail | Username       | Date       | ... |
|--------------|----------------|------------|-----|
|            1 | jdoe@mail.com  | 2019-01-16 | ... |
|            2 | jdoe@mail.com  | 2019-01-29 | ... |
|            3 | jjones@mail.com| 2019-02-14 | ... |
|          ... | ...            | ...        | ... |
----------------------------------------------------

Sent_Mail_Contacts
---------------------------------------------------------------------
| ID_Mass_Mail | Username       | Contact_ID | Contact_Email  | ... |
|--------------|----------------|------------|----------------|------
|            1 | jdoe@mail.com  |          1 | bob@mail.com   | ... |
|            1 | jdoe@mail.com  |          2 | jim@mail.com   | ... |
|            1 | jdoe@mail.com  |          3 | cindy@mail.com | ... |
|          ... | ...            |        ... | ...            | ... |
|            2 | jdoe@mail.com  |          4 | mike@mail.com  | ... |
|            2 | jdoe@mail.com  |          2 | jim@mail.com   | ... |
|            2 | jdoe@mail.com  |          3 | cindy@mail.com | ... |
|          ... | ...            |        ... | ...            | ... |
---------------------------------------------------------------------
G. Langlois
  • 75
  • 1
  • 8
  • On a side note: The data model seems a bit flawed. Why does `Sent_Mail_Contacts` have both a `Contact_ID`and a `Contact_Email`? If contacts have an ID, then there should be a contact table and that table should have one row per contact ID and contain the related email address. Then you have the `Username` in `Sent_Mail_Contacts`. Why? Is `ID_Mass_Mail` nullable, so there are rows where you cannot get the user from there? Otherwise, what would different user names in `Mass_Mail` and `Sent_Mail_Contacts` indicate? An email with a fake account? Get your database rid of redundancies. Normalize. – Thorsten Kettner Mar 01 '19 at 22:29

4 Answers4

1

Use COUNT(DISTINCT ...) :

SELECT
    CONCAT(Usernames.FirstName, ' ', Usernames.LastName) AS 'Full Name',
    Usernames.Username,
    COUNT(Sent_Mail_Contacts.IDContact) AS `Total Sent`,
    COUNT(DISTINCT Mass_Mail.IDMass_Mail) AS `Individual E-Mails`
FROM Usernames
LEFT JOIN Sent_Mail_Contacts ON Usernames.Username = Sent_Mail_Contacts.Username
LEFT JOIN Mass_Mail ON Usernames.Username = Mass_Mail.Username
GROUP BY Usernames.Username
ORDER BY `Total Sent`

NB : this will not make the query any faster though. To start with, you should at least make sure that you are using primary/foreign keys relations in the JOINs : Usernames(Username), Sent_Mail_Contacts(Username), Mass_Mail(Username)

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Maybe I wasn't clear enough. I'm sorry. What I meant is that there is only one IDMass_Mail per email sent. The email number 1457 can be sent to 100 contacts and while in the table "Sent_Mail_Contacts", it will show a line for every contact, they will all have the same IDMass_Mail, which is 1457 and in the table Mass_Mail, the IDMass_Mail 1457 will only be there once. I don't know if I'm being clear sorry, I'm a little new to all this. – G. Langlois Mar 01 '19 at 20:47
  • @G.Langlois : mm, ok, I think I misread your question a little. It would be good if you could edit your question to add sample data for the 3 input table, and the corresponding expected results. See [this link](https://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query). For the time being I will delete my answer. – GMB Mar 01 '19 at 20:51
  • @G.Langlois : given your sample data, it looks like `DISTINCT` is really what you need... So I undeleted my answer. – GMB Mar 01 '19 at 23:49
  • It was indeed what I needed. I've posted an answer to show what I did and it worked perfectly. Thank you very much! – G. Langlois Mar 04 '19 at 20:03
1

Assuming the values in IDMass_Mail indicate a unique email, then you just need to edit the last COUNT to use the DISTINCT keyword.

COUNT(DISTINCT Mass_Mail.IDMass_Mail) AS `Individual E-Mails`

That will return the number of unique values in the grouping by Username.

You should also get a performance boost if you're able to add indexes to the Username columns in the Sent_Mail_Contacts and Mass_Mail tables.

Eric Brandt
  • 7,886
  • 3
  • 18
  • 35
  • Hi Eric. Thanks for your answer. Someone before you answered the same thing and I'm not sure it would work. Maybe I didn't understand your answer properly, but the issue I see with it is that the IDMass_Mail is indeed unique and there's no need to count distinct IDMass_Mail because in the Mass_Mail table, they only show up once. I want to know how many IDMass_Mail there is for every Username. I've added more details about what the 3 tables look like in my question. Maybe you can check it out and see if that helps. – G. Langlois Mar 01 '19 at 21:18
  • From your sample data, in the `Sent_Mail_Contacts` table, if you group by `Username`, there will be two distinct values of `ID_Mass_Mail`, 1 and 2. So that's two unique mailings and a total sent of six from one user. That's the result set you're looking for. Grab a subset of data and try it out so you're not waiting forever on it. – Eric Brandt Mar 01 '19 at 21:51
  • I've tried to use the DISTINCT function and it does work... kind of, because for some reason I get accurate data for most users, yet a few of them are showing 0 unique emails even though they send emails to 10,000 contacts... But for the most part it does work as intended. Also I've tried going for the same query but limited to the last few days and it was much faster. I think that your answer is accurate, but that I'm missing some information... There are other variables I may have not taken into account in our database. I'll get back to this on Monday. Thanks a lot for your help – G. Langlois Mar 01 '19 at 22:04
1

I managed to do it using a query that (besides from changing the actual table and column names due to privacy concerns) looked exactly like this.

SELECT
    Accounts.Account_Name AS `account`,
    Usernames.Username AS `username`,
    COUNT(Mass_Mail_Reached_Contacts.ID_Contact) AS `total_emails`,
    COUNT(Mass_Mail_Reached_Contacts.ID_Mass_Mail) / 
        (
            SELECT COUNT(*)
                FROM 
                    Mass_Mail_Reached_Contacts 
                WHERE
                    Mass_Mail_Reached_Contacts.DATE >= '2019-02-01'
                    AND
                    Mass_Mail_Reached_Contacts.DATE <= '2019-02-28'
        )
    * 100 AS `%`,
    COUNT(DISTINCT Mass_Mail.ID_Mass_Mail) AS `unique_emails`,
    COUNT(Mass_Mail_Reached_Contacts.ID_Mass_Mail) / 
        COUNT(DISTINCT mass_mail.ID_Mass_Mail) 
        AS `avg_contacts_per_email`

FROM
    Usernames

LEFT JOIN Mass_Mail_Reached_Contacts ON Mass_Mail_Reached_Contacts.Username = Usernames.Username
LEFT JOIN Account ON Account.ID_Account = Usernames.ID_Account
LEFT JOIN Mass_Mail ON Mass_Mail.ID_Mass_Mail = Mass_Mail_Reached_Contacts.ID_mass_mail

WHERE
    Mass_Mail_Reached_Contacts.DATE >= '2019-02-01'
    AND
    Mass_Mail_Reached_Contacts.DATE <= '2019-02-28'

GROUP BY
    Usernames.Username

HAVING COUNT(DISTINCT Mass_Mail.IDMass_Mail) > 0

ORDER BY
    `total_emails` DESC

I'm now able to get a table that looks like this

Emails Stats
--------------------------------------------------------------------------------------
| account  | username     | total_emails |     % | unique_emails | avg_contact_email | 
|----------|--------------|--------------|-------|------------------------------------
| Bob inc. | bob@mail.com | 28,550       | 14.52 |            12 |           2379.17 |
| ...      | ...          | ...          |   ... |           ... |               ... |
--------------------------------------------------------------------------------------
G. Langlois
  • 75
  • 1
  • 8
0

To start with: Why do Mass_Mail and Sent_Mail_Contacts both contain a Username? This looks redundant. Or is Sent_Mail_Contacts.ID_Mass_Mail nullable?

For this query at least, I suppose we can ignore the Username in Sent_Mail_Contacts completely. What really links the two tables is the ID_Mass_Mail, and you have forgotten this join criteria in your query.

select
  ws_concat(' ', u.firstname, u.lastname) as full_name,
  u.username,
  count(smc.idmass_mail) as total_sent,
  count(mm.idmass_mail) as individual_e_mails
from usernames u
left join mass_mail mm on mm.username = u.username
left join sent_mail_contacts smc on smc.id_mass_mail = u.id_mass_mail
group by u.username
order by total_sent;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Thanks. I managed to achieve my goal by using the ID_Mass_Mail rather than the Username to join the two tables and it worked. Also, I improved my query and managed to bring the processing time down to 12 seconds. – G. Langlois Mar 04 '19 at 19:46