-1

In SQL I'm trying to combine multiple tables and grab the SUM of expenses per person, and sort those by highest total expense first. I have 3 tables:

  • test1 (from grocery store #1)
  • test2 (from grocery store #2),
  • junction1 (one that I just created to somehow try to connect test1 and test2 together)

I cannot edit test1 and test2 in the production environment. I created junction1 as a bridge to connect test1 and test2. I can modify columns/content in junction1. The IDs of test1 and test2 may change in the future (right now they are the same).

Desired result:

Desired result table

I need to do a full join on all tables, since I want to include all personnel from both tables. test1 and test2 are independent, as some people only shop in test1 locations and some only shop in test2 locations. Also to sort by Total SUM of both tables I tried:

ORDER BY SUM(Grocery1 + Grocery2) DESC

No luck.

A SUM select statement (no joins) works:

select junction1.Name1, SUM(Amount) AS Grocery1 
from test1 
FULL JOIN junction1 on junction1.ID1= test1.ID1 
GROUP BY junction1.Name1 ORDER BY Grocery1 DESC;

Part 1

But when I join the table(s):

select junction1.Name1, SUM(test1.Amount) AS Grocery1, SUM(test2.Amount) AS Grocery2
from test1
FULL JOIN junction1 ON test1.ID1 = junction1.ID1
FULL JOIN test2 ON test2.ID2 = junction1.ID2
GROUP BY junction1.Name1

It gives:

Incorrect Results

The data is off in both columns. Andy should only have $400 for Grocery1. It looks like it's multiplying it instead of adding it. I tried to divide by 3, which helps some of the people with 3 entries, but that's probably not what I want.

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • How did you create `junction1`? Was it created solely for this query? If so, the logic you used to create the table for joining could probably just be put in the actual query – EdmCoff Nov 18 '22 at 21:27
  • 1
    Please never post [images](https://meta.stackoverflow.com/questions/285551) of code, data or errors. – Stu Nov 18 '22 at 21:33
  • Hello nbk, I am using Microsoft SQL Server Management Studio, v18.12, if that helps... – Greg-learner Nov 18 '22 at 22:30
  • Hello EdmCoff, correct - I created junction1 just as a hopeful bridge, it contains (since I cannot post images?) 3 columns, Name1 (Andy M, Braco T, Charles Z, Phil H), ID1 (15, 16, 17, 18) and ID2 (15, 16, 17, 18) , The data in Columns ID1 and ID2 have the same ID that represent the person in tables test1 and test2, if that makes sense... – Greg-learner Nov 18 '22 at 22:33
  • I guess my question is, is the `name` column on `test1`/`test2` safe to join on, or did more complicated logic go into making the `junction1` table? i.e. If the name is exactly the same in test1 and test2, is it guaranteed to be the same person (and if it doesn't match is it guaranteed to not be the same person)? Can you describe the logic for how you created that table? If this is a real application and those aren't unique usernames, then probably not, but since all you want returned is names and amounts it seems worth asking. – EdmCoff Nov 18 '22 at 22:46
  • Hello EdmCoff, unfortunately the name columns are all different for all of the stores and carriers - some have first then last, some have initials, etc, so joining on that probably won't work, hence I have included the ID columns in the junction table, also on that note (or a side note), although in this example ID1 and ID2 are the same, they will be different in the future, that's why I created the junction table...hope that helps! – Greg-learner Nov 18 '22 at 23:03
  • Please ask 1 specific researched non-duplicate question. Please either ask about 1 bad query with the obligatory [mre] & why you think it should return something else at the 1st subexpression that it doesn't give what you expect, justified by reference to authoritative documentation, or ask about your overall goal giving working parts you can do & ideally a [mre]. But please ask about the former 1st because misconceptions in the former will get in the way of understanding the latter. And bad code doesn't tell us what you wish it would do. [ask] [Help] Please clarify via edits, not comments. – philipxy Nov 19 '22 at 00:10
  • A [mre] includes cut & paste & runnable code including initialization; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy Nov 19 '22 at 00:16
  • @nbk Since you mention it, this is why: [Why should I not upload images of code/data/errors when asking a question?](https://meta.stackoverflow.com/q/285551/3404097) (That link was already posted above so I didn't repeat it.) [Why are images of text, code and mathematical expressions discouraged?](https://meta.stackexchange.com/q/320052/266284) I'm done now. – philipxy Nov 20 '22 at 01:23

2 Answers2

1

I don't think you want a full join. You would if both tables could be joined together directly, but the fact that we need to to through the junction table changes the situation.

We can start from the junction table, then union both grocery tables (while separating the original columns) and bring them with a left join on both possibles matches. The last step is aggregation:

select j.name1, sum(t.amount1) as grocery1, sum(t.amount2) as grocery2
from junction1 j
left join (
    select id1, null as id2, amount as amount1, null as amount2 from test1
    union all select null, id2, null, amount from test2
) t on t.id1 = j.id1 or t.id2 = j.id2
group by j.name1

Using union avoids the "row multiplication" issue that you are seeing. On the other hand, the left join ensures that records from both tables are preserved, regardless of whether or not the same person actually shopped at both locations.

GMB
  • 216,147
  • 25
  • 84
  • 135
0

A full join will return all records where there is a match on either side of the join. This is causes SQL to return multiple copies of the data for Andy.

Change the query to use an INNER JOIN, which will only produce matching records:

SELECT junction1.Name1, 
       SUM(test1.Amount) AS Grocery1, 
       SUM(test2.Amount) AS Grocery2
  FROM test1
       INNER JOIN junction1 ON test1.ID1 = junction1.ID1
       INNER JOIN test2 ON test2.ID2 = junction1.ID2
 GROUP BY junction1.Name1
Mike Hofer
  • 16,477
  • 11
  • 74
  • 110
  • Hello Mike, Thank you for your post! When I tried INNER JOIN (using your code above), it still returned $1600 for Andy for grocery1 (which it should be $400), and it didn't have Phil H in there (he's only in table test2 and junction1) – Greg-learner Nov 18 '22 at 22:37