2

Complete newbie to mySQL. So any help will be appreciated.

I have 3 tables -- carts, users, actions.

carts:
+------------+-------------+-------+
| cartId     | session_id  | userId| 
+------------+-------------+-------+

users:
+----------+-------------+
| usedId   | email       |
+----------+-------------+

actions:
+-------------+------------------+---- ---------+
| session_id  | impressionAction | impressionId | 
+-------------+------------------+-----+--------+

In carts, there is one session_id per line.

In users, there is one userId per line.

In actions, there are multiple lines per session_id counting for all the actions for that session.

I would like to JOINthe three tables getting the output to be something like

+------+-------------+--------+------------------+--------------+-------+
userId | session_id  | cartId | impressionAction | impressionId | email |
+------+-------------+--------+------------------+--------------+-------+

Where there will be multiple lines per userId and session_id; essentially a flattened file. I think if we JOIN carts and users on userId resulting in say A and then JOIN A and actions' onsession_id`, we are home.

A sample expected output is:

+------------+-------------+--------+------------------+--------------+---------+
userId       | session_id  | cartId | impressionAction | impressionId | email   |
+------------+-------------+--------+------------------+--------------+---------+
| 1234       | abc3f45     | 0001   | LOGIN            | 2032         |ab@yc.com|
| 1234       | abc3f45     | 0001   | ADD              | 4372         |ab@yc.com|
| 1234       | abc3f45     | 0001   | ADD              | 4372         |ab@yc.com|
| 1234       | abc3f45     | 0001   | SENDMAIL         | ab@yc.com    |ab@yw.com| 
| 4567       | def4rg4     | 0002   | LOGIN            | 2032         |db@yw.com|
| 4567       | def4rg4     | 0002   | ADD              | 4372         |db@yw.com|
| 4567       | def4rg4     | 0002   | REMOVE           | 3210         |db@yw.com|
+------------+-------------+--------+------------------+--------------+---------+** 

I don't know how to JOIN 3 tables without one common key. I don't even know what type of join it is called.

Essentially, we are trying to join 3 tables with non-overlapping keys, gathering one common key through the first JOIN and then joining the intermediate with the third one. Is this called a CROSS JOIN? If no, is there a name?

eager_learner313
  • 97
  • 1
  • 3
  • 11
  • this could get ugly. you're doing a tree-based joining. `A->B` and `A->C`. You're going to get undefined results, especially in the case where B and C have different numbers of matching records for any "parent" record in A. It kind of looks like the carts:users relationship is 1:1, but if it's 1:n/n:1, then all bets are off. – Marc B Jul 08 '14 at 18:50
  • A user can have multiple carts? – Relevant Jul 08 '14 at 18:51
  • @Relevant: yes, s/he can, which would yield a different `session_id` – eager_learner313 Jul 08 '14 at 18:56
  • Could you provide a sample expected output? – Relevant Jul 08 '14 at 18:58
  • Yes, I will add it to the question. – eager_learner313 Jul 08 '14 at 19:10
  • 1
    @eager_learner313 Check this [SQL Fiddle](http://sqlfiddle.com/#!2/95a8b/2) If is this what you're looking for, I'll paste the statement as an answer :) – hex494D49 Jul 08 '14 at 19:21
  • @hex494D49 thanks will check it. In the meantime, can you take a look at the expected output provided in case you missed it? I just updated it. – eager_learner313 Jul 08 '14 at 19:35
  • @eager_learner313 Updated [SQL Fiddle](http://sqlfiddle.com/#!2/9825a/1) – hex494D49 Jul 08 '14 at 19:44
  • @hex494D49: I think you have it my friend. I am so sorry to say this, but I just realized one part of my update never got posted. Please take a look at this. Again, apologies for this. I didn't see that. – eager_learner313 Jul 09 '14 at 16:49
  • @eager_learner313 Shall I paste it down as an answer :) ? – hex494D49 Jul 09 '14 at 16:51
  • @hex494D49: yes you may! However, I'd like you to explain the motivation and the reasoning behind the code, for that is what I am looking for-- to learn and get some heuristics down. Why join what table on what, how did you visualize the solution from the sample output, etc.? The devil is always in the details and I am struggling with JOINs and doing them properly. – eager_learner313 Jul 09 '14 at 17:52
  • @eager_learner313 I had lots of similar situation. I just folloe some logic :) In this case users are at the top, they do have sessions, they do something with the card etc... If you explain in more details your situation maybe I could suggest you a better solution :) For example, what's the card, a shopping card or something else, what is exactly impression and what makes it different from an action... – hex494D49 Jul 09 '14 at 17:56
  • @hex494D49: can you add this explanation to your answer? Here is a detailed explanation--- a USER may select many products, add to their CART, a single USER may have multiple CARTS, at the end of the event, they can EMAIL the cart to them. The ACTIONS of the user are stored in the `actions` table. I am just trying to have a flattened file containing the information from all the sources. Another thing: so in your code, are you JOINING CARTS to both USERS and ACTIONS? I am bit confused about that. – eager_learner313 Jul 09 '14 at 18:12
  • @eager_learner313 I see. I'll try to explain all in details in a meantime. Actually, I'm joining users, to the actions through the cards. Anyhow, I'll check the statement once again 'cause the whole process is clearer now :) – hex494D49 Jul 09 '14 at 18:22
  • @hex494D49: yes, if you can explain what is going on with the two JOIN statements on 1 table, that will be great. Also, why is the `carts` table the main one for everything else to be `JOINED` on?-- things like these – eager_learner313 Jul 09 '14 at 20:08

2 Answers2

5

Taken from your comment above

A USER may select many products, add them to their CART; a single USER may have multiple CARTS and at the end of the event, they can EMAIL the cart to themselves; the ACTIONS of the user are stored in the actions table

This is how I see the structure (having in mind your data)

+---------------------+     +---------------------+     +---------------------+
| users               |     | carts               |     | actions             |
+---------------------+     +---------------------+     +---------------------+
| user_id       [PK]  |--|  | cart_id       [PK]  |     | impression_id [PK]  |
| email               |  |--| user_id       [FK]  |     | action_name         |
|                     |     | product_id    [FK]  |  |--| session_id    [FK]* |
+---------------------+     | session_id    [FK]* |--|  |                     |
                            |                     |     +---------------------+
                            +---------------------+    

As you can see above, I'm joining first with carts and them with actions because only the table carts has both, user and session data.

The [FK]* next to the session_id on carts and actions could seem as a foreign key but in this case it's not - 'cause there's no separate table for sessions where it would be placed as an PK (primary key)

You asked about join - it is the same as inner join. INNER JOIN creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate.

This is a possible content of the tables

+------------------------+
| users                  |
+------------------------+
| id   | email           |
+------+-----------------+
| 1    | first@mail.org  |
| 2    | second@mail.org |
| 3    | third@mail.org  |
+------+-----------------+

+------------------------------------------+
| carts                                    |
+------------------------------------------+
| id   | user_id | product_id | session_id |
+------+---------+------------+------------+
| 1    | 1       | 5          | 1aaaa      |
| 2    | 2       | 5          | 2ffff      |
| 3    | 3       | 8          | 3ddddd     |
| 4    | 1       | 5          | 1aaaaa     |
| 5    | 3       | 9          | 3bbbbb     |
| 6    | 1       | 6          | 1ccccc     |
+------+---------+------------+------------+

+-------------------------------+
| actions                       |
+-------------------------------+
| id   | name      | session_id |
+------+-----------+------------+
|  1   | ADD       | 1aaaa      |
|  2   | ADD       | 2ffff      |
|  3   | SENDMAIL  | 3ddddd     |
|  4   | ADD       | 3ddddd     |
|  5   | SENDMAIL  | 2ffff      |
|  6   | ADD       | 1aaaaa     |
|  7   | REMOVE    | 3ddddd     |
|  8   | ADD       | 1ccccc     |
|  9   | ADD       | 3bbbbb     |
| 10   | SENDMAIL  | 3bbbbb     |
+------+-----------+------------+

As you can see, there are six products in the table carts and exactly six add actions in the table actions. Furthermore, as you can see user with an id=1 bought three products but not at the same time, since there are two sessions; user with an id=3 as well, bought these two products in different times etc...

The SQL statement

SELECT u.user_id, c.session_id, c.cart_id, a.impression_id, a.action_name, u.email
FROM users AS u
INNER JOIN carts AS c ON c.user_id = u.user_id
INNER JOIN actions AS a ON a.session_id = c.session_id
ORDER BY u.user_id, c.session_id, c.cart_id

Results:

+---------+------------+---------+---------------+-------------+-----------------+
| user_id | session_id | cart_id | impression_id | action_name | email           |
+---------+------------+---------+---------------+-------------+-----------------+
| 1       | 1aaaa      | 1       | 1             | ADD         | first@mail.org  |
| 1       | 1aaaa      | 1       | 6             | ADD         | first@mail.org  |
| 1       | 1aaaa      | 4       | 1             | ADD         | first@mail.org  |
| 1       | 1aaaa      | 4       | 6             | ADD         | first@mail.org  |
| 1       | 1cccc      | 6       | 8             | ADD         | first@mail.org  |
| 2       | 2ffff      | 2       | 5             | SENDMAIL    | second@mail.org |
| 2       | 2ffff      | 2       | 2             | ADD         | second@mail.org |
| 3       | 3bbbb      | 5       | 9             | ADD         | third@mail.org  |
| 3       | 3bbbb      | 5       | 10            | SENDMAIL    | third@mail.org  |
| 3       | 3dddd      | 3       | 3             | SENDMAIL    | third@mail.org  |
| 3       | 3dddd      | 3       | 4             | ADD         | third@mail.org  |
| 3       | 3dddd      | 3       | 7             | REMOVE      | third@mail.org  |
+---------+------------+---------+---------------+-------------+-----------------+

Note: There's no guarantee for session uniqueness.

(Updated) Working SQL Fiddle


UPDATE: (Finding and deleting duplicates)

I've updated the SQL Fiddle in order to simulate duplicate records (when user added the same product within the same session). With the following statement you'll be able to retrieve those duplicated rows.

SELECT c.card_id, c.user_id, c.product_id, c.session_id, a.action_name, a.impression_id
FROM cards As c
INNER JOIN actions AS a ON a.session_id = c.session_id
GROUP BY c.user_id, c.product_id, c.session_id, a.action_name
HAVING count(*) > 1

Results:

+---------+------------+------------+------------+-------------+-----------------+
| card_id | user_id    | product_id | session_id | action_name | impression_id   |
+---------+------------+------------+------------+-------------+-----------------+
| 1       | 1          | 5          | 1aaaa      | ADD         | 1               |
| 6       | 1          | 6          | 1cccc      | ADD         | 8               |
+---------+------------+------------+------------+-------------+-----------------+

In the SELECT part of the statement above you may omit everything except card_id and impression_id. Deleting these two duplicates in one statement is a bit tricky since you can't modify the same table selected in a sub-query within the same query. I would avoid the tricky part in this case (which involves another inner sub-query) and would delete duplicates using separate statements as following

-- delete duplicates from cards
--
DELETE FROM WHERE card_id IN (1,6)

-- delete duplicates from actions
--
DELETE FROM WHERE card_id IN (1,8)

Even better, you could check if the user already has been added a selected product and don't add it twice.

hex494D49
  • 9,109
  • 3
  • 38
  • 47
  • Real quick, as far as the mechanism of joins go, is the following happening? `carts` is joined with `users` to form, say, `carts_user` table which then gets joined with `actions` to form the final table `carts_users_actions`? also, does the order of the tables matter? Could I have done: `FROM carts AS c INNER JOIN users AS u ON c.user_id = u.user_id INNER JOIN actions AS a ON a.session_id = c.session_id ORDER BY u.user_id, c.session_id, c.cart_id` ? that is, put `carts` as the first table? – eager_learner313 Jul 10 '14 at 03:53
  • @eager_learner313 In general, `join` order does matter but not if you use `inner join`; though, that isn't a proper perspective, at least to me, since the user is the one to start an action, not the cart itself :) When using `join` think like Honestly, I was thinking about this, and somehow I don't like joining tables on non primary / foreign keys. (c.session <-> a.session in this case); – hex494D49 Jul 10 '14 at 08:51
  • Furthermore, there's no guarantee for session uniqueness (if you use the one provided out-of-the-box) and it could cause performance issue since it's a varchar value. What do you use on server-side (php, java, c#) and could you make a print-screen of your database schema - maybe I could suggest a better design :) – hex494D49 Jul 10 '14 at 08:51
  • thanks for the clarifications! I think there were some typos to the message starting with "In general, join order".. it stops making sense after the first sentence or so. Can you fix that? :) – eager_learner313 Jul 10 '14 at 16:19
  • @eager_learner313 Oh sorry, I see... the half of sentence is gone. I wanted to say, when using `join` think like you're making a new table, then this table can be joined based on some key to another, ans so one... Another advice, don't use `varchar` for session_id in this case - generate an `integer` or convert the given varchar session to an integer; and mark it as `unique` column. – hex494D49 Jul 10 '14 at 22:57
  • a follow-up question. I am seeing that the same product gets added TWICE in a single session by **mistake**. I would like to remove `duplicates` of the `impressionId` where `ImpressionAction = ADD`for the **same** `cartId`. Not that the same `product` may be added in 2 different carts under a single session. For eg., session `1aaa` can have the `impressionId =6` (which is the product ID) added twice for the `cartId = 1`. How do I remove this duplicate pair? – eager_learner313 Jul 23 '14 at 03:09
  • @eager_learner313 Hi :) Check the updated answer; the last section about finding and deleting duplicated records. – hex494D49 Jul 23 '14 at 09:25
  • You are so cool! Thanks for the update. However, I think it quite get it. Why is `cartId = 6` showing up? It's the same `userId` but under different `session_id`. Also, I am not quite sure about how to go about and manually deleting the duplicates as mentioned in your deleting duplicates from `carts` and `actions`. If we can remove duplicates from `actions` and then `JOIN`, we are home. Essentially, only have `distinct` `impression_id`s under any single `session_id` when `impressionAction = ADD`. Is there any way we may chat over here or on any other medium? I think that will be better. – eager_learner313 Jul 23 '14 at 15:38
  • @eager_learner313 This is a general statement for retrieving duplicate records `SELECT column FROM table GROUP BY column HAVING count(column) > 1` I was adding another duplicate record and that's why there are two records showing up (card_id 1 and 6). Since card_id and impression_id are auto-increment columns you should check user_id, product_id, session_id and action_name in order to find duplicates. Although finding and deleting might be written as a procedure I prefer doing it using a helper routine in PHP (in this case). I'll update the answer in the meantime. – hex494D49 Jul 23 '14 at 20:47
  • Got it! I am on EST. What time zone are you on? – eager_learner313 Jul 24 '14 at 15:11
  • sent you an email. Did you get it by any chance? – eager_learner313 Aug 28 '14 at 19:51
  • @eager_learner313 I just saw it, 'cause I don't check this account every day. Let me read it and I'll write you back. – hex494D49 Aug 28 '14 at 19:54
  • Is there an email that you check more frequently? If so, would you mind sharing that in a reply to my email? (NOT HERE) – eager_learner313 Sep 02 '14 at 20:09
  • @eager_learner313 I replied to your last email. You didn't get it? – hex494D49 Sep 02 '14 at 20:27
  • Nope. Just checked it. Any chance you can resend? Sorry about this; not sure why I am not getting it. I also checked my SPAM folder – eager_learner313 Sep 02 '14 at 23:27
1

Excuse my MySql syntax, as I don't know it :-p But this is the idea

SELECT u.userId, a.session_id, c.cartId, a.impressionAction, a.impressionId, u.email
FROM Carts c 
JOIN Users u on u.userId = c.UserId
JOIN Actions a on a.session_id = c.session_id

This will just merge everything together, and you'll have duplicate cart records if you have many to 1 relationships

Relevant
  • 148
  • 1
  • 15