1

Got 2 tables - contacts and messages:

contact_id  |   contact_email
1           |   some@mail.com
2           |   other@mail.com
3           |   no@nono.com

message_id  |   message_recipients
1           |   1,2,3
2           |   3

message_recipients field contains ID(s) of contact(s) message was assigned to. Each message can have one or more IDs assigned, so they are separated by , symbol.

I need to show all contacts, and count of messages are assigned to each contact. Since message_recipients field may contain multiple IDs, I can't run a query like SELECT * FROM contacts, messages WHERE contacts.contact_id=messages.message_recipients because it won't work properly.

If I run SELECT * FROM contacts FULL JOIN messages, it returns many duplicated rows from contacts table. Sure thing, I can run SELECT * FROM contacts FULL JOIN messages GROUP BY contact_id, but this one returns only 1st message from messages table.

I know that in order to count how many messages each contact has assigned to, I will probably need to explode message_recipients field from each row into array and use code like if (in_array($contact_id, $message_recipients_array)) {$total++;} or similar. Now my main concern is how to all I need by writing as simple query as possible.

Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
Mindaugas Li
  • 1,071
  • 5
  • 15
  • 37
  • 1
    Fix your table structure. **Do not store multiple values in one cell**. See [**Normalization**](https://en.wikipedia.org/wiki/Database_normalization) – Gurwinder Singh Feb 28 '17 at 12:10
  • Any comments how should it look like? Sure thing, I can always store ONE id in message_recipients field, but in this case messages table will be filled with hundreds of duplicated records (where only ID field will be different, while fields with message text and other details will be the same). I don't think it's a good practice. – Mindaugas Li Feb 28 '17 at 12:12

2 Answers2

1

Fix your table structure. Do not store multiple values in one cell. See Normalization

For now, you can use FIND_IN_SET:

select c.contact_id,
    c.contact_email,
    count(*) no_of_messages
from messages m
join contacts c on find_in_set(c.contact_id, m.message_recipients) > 0
group by c.contact_id,
    c.contact_email

But this will be slow as it can't use any index on the contact_id or message_recipients.

To actually fix the issues, don't include recipient_id in the messages table.

You should have stored single recipient in one row in a separate mapping table with many to many relation with (maybe) the following structure.

messages_recipients (
    id int PK,
    message_id int FK referring message(message_id),
    message_recipient_id int FK referring contacts(contact_id)
)

Then all you had to do was:

select c.contact_id,
    c.contact_email,
    count(*) no_of_messages
from messages_recipients m
join contacts c on c.contact_id = m.message_recipient_id
group by c.contact_id,
    c.contact_email

This query is Sargable and will be faster.

Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
  • Do you mean that having thousands of duplicated entries in messages table is still better? – Mindaugas Li Feb 28 '17 at 12:16
  • Thanks. However, I believe it won't work in my scenario because editing messages later can become a headache. Now, when I edit a message, it gets updated for all assigned recipients at the same time. So if one message has 100 recipients, I edit message only once. If I have 100 separate records in messages table, I will need to edit 100 messages. Sure enough, I could do it automatically by assigning some hash value for each created message (so when message is edited, it will auto process other messages with the same hash), but I just don't know... – Mindaugas Li Feb 28 '17 at 12:21
  • If you have message details etc keep the message table separate. do not include receipient id in that table. Create a separate table with mapping between message_id and recipient_id. I've update it in my answer too. Please check – Gurwinder Singh Feb 28 '17 at 12:24
  • Thanks, will try... – Mindaugas Li Feb 28 '17 at 12:24
  • Just a quick update: followed this advice about creating mapping table, and everything seems to be fine. Just had to use a different SQL query (because the one provided in answer doesn't show contacts if they have no messages assigned), but besides that everything is alright. – Mindaugas Li Feb 28 '17 at 20:25
0

Fix your data structure! Storing ids in strings is a really bad idea. Why?

  • Numbers should be stored as numbers not strings.
  • SQL does not offer very good string functions.
  • Foreign key constraints should be properly expressed.
  • The query optimizer cannot use indexes or partitions.
  • SQL has a great method for storing lists: it is called a "table".

Sometimes, we are stuck with other people's really, really bad design decisions. MySQL does offer a method for doing what you want, find_in_set(). This is a hack to get around the short-comings of a bad data layout:

select . . .
from contacts c join
     messages m
     on find_in_set(c.contact_id, m.message_recipients) > 0
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for the input. Any example of how proper structure should look like? Just keep in mind that one message may have HUNDREDS of contacts assigned, so if I create a separate record for each contact in messages table, it will be flooded with thousands of identical records (where only ID will be different). – Mindaugas Li Feb 28 '17 at 12:15