I have an SQLite database (v. 3.8.1) with somewhat unusual schema that can't be changed.
For the purposes of this questions, there are 5 tables (t1 through t5), and I need to create a summary report using data from t1 and t5, but the data I need to reference in t5 can only be gleaned based on relationships to records in t1 through t4.
To help clarify - imagine that t1 holds data regarding a document. The document can subsequently go through 1 to 4 more iterations (with different fields available in each iteration, hence the 5 different tables rather than just a flag in 1 table to signify what iteration it is at).
I'm interested in whether or not an initial record/document (held in t1) has reached it's final iteration or not (a ParentGUID exists in t5 that when followed up the chain of tables, eventually reaches t1, or not).
t1 has a GUID (text) field, and t2 through t5 have GUID and ParentGUID fields (also text). The ParentGUID field in t2 through t5 don't have to be populated (documentation iterations can be skipped in some cases), but when ParentGUID has a value it will always be a GUID from a previous table (for example, if t5 has a ParentGuid value, it will be a GUID from t1, t2, t3 OR t4).
This means that I want all of the distinct records from t1, and then for each a value (or values) from t5 if present, or null if not.
If a ParentGuid field value in a t5 record is the GUID of a record in t4, and the ParentGuid field value in that t4 record is the GUID of a record in t1, then that particular t1 record is considered to have reached its final iteration.
Similarly, ParentGUID > GUID links that will be considered t1 > t5, initial > final iterations include:
t1 > t2 > t3 > t4 > t5
t1 > t2 > t3 > t5
t1 > t2 > t4 > t5
t1 > t2 > t5
t1 > t3 > t4 > t5
t1 > t3 > t5
t1 > t4 > t5
t1 > t5
Or represented graphically:
Consider the following test schema:
CREATE TABLE Table1
("GUID" TEXT, "Name" TEXT)
;
CREATE TABLE Table2
("GUID" TEXT, "ParentGUID" TEXT)
;
CREATE TABLE Table3
("GUID" TEXT, "ParentGUID" TEXT)
;
CREATE TABLE Table4
("GUID" TEXT, "ParentGUID" TEXT)
;
CREATE TABLE Table5
("GUID" TEXT, "Name" TEXT, "Amount" REAL, "ParentGUID" TEXT)
;
INSERT INTO Table1
("GUID", "Name")
VALUES
('ABC', 'A1')
;
INSERT INTO Table1
("GUID", "Name")
VALUES
('DEF', 'A2')
;
INSERT INTO Table1
("GUID", "Name")
VALUES
('GHI', 'A3')
;
INSERT INTO Table2
("GUID", "ParentGUID")
VALUES
('JKL', 'GHI')
;
INSERT INTO Table2
("GUID", "ParentGUID")
VALUES
('MNO', '')
;
INSERT INTO Table2
("GUID", "ParentGUID")
VALUES
('PQR', 'GHI')
;
INSERT INTO Table3
("GUID", "ParentGUID")
VALUES
('STU', 'MNO')
;
INSERT INTO Table3
("GUID", "ParentGUID")
VALUES
('STU', 'GHI')
;
INSERT INTO Table3
("GUID", "ParentGUID")
VALUES
('VWX', 'PQR')
;
INSERT INTO Table4
("GUID", "ParentGUID")
VALUES
('YZA', 'VWX')
;
INSERT INTO Table4
("GUID", "ParentGUID")
VALUES
('BCD', '')
;
INSERT INTO Table4
("GUID", "ParentGUID")
VALUES
('EFG', 'GHI')
;
INSERT INTO Table5
("GUID", "ParentGUID", "Amount", "Name" )
VALUES
('HIJ', 'EFG', -500, 'E3')
;
INSERT INTO Table5
("GUID", "ParentGUID", "Amount", "Name" )
VALUES
('KLM', 'YZA', -702, 'E2')
;
INSERT INTO Table5
("GUID", "ParentGUID", "Amount", "Name" )
VALUES
('NOP', '', 220, 'E8')
;
INSERT INTO Table5
("GUID", "ParentGUID", "Amount", "Name" )
VALUES
('QRS', 'GHI', 601, 'E4')
;
What I'd like to do is get all records in t1, and then show the total of all related Amount fields from t5 (related in any of the ways listed above), and the group_concat of all the related Name fields from t5.
Using the above sample schema, it would look something like:
t1.Name total(t5.Amount) group_concat(t5.Name)
--------------------------------------------------
A1 0.00
A2 0.00
A3 -601.00 E2,E3,E4
I've tried a bunch of different joins but nothing has worked...either I get too many items in my Total/Group_Concat cells (too high a total due to adding items multiple times, and multiple repeat names such as "E4,E4,E4,E4,E2,E3,E3,E4,E4..."), or I can only get the one directly linked item to t1 from t5 (601.00, E4).
For example, the query just gives me the E4/601.00 result for t1 record GHI:
SELECT DISTINCT t1.guid "OriginalGuid", t1.name "OriginalName", TOTAL(t5."Amount") as "TotalAmount", group_concat(t5.Name) AS "FinalNames"
FROM
Table1 t1
LEFT JOIN Table5 t5 ON (t1.GUID=t5.ParentGUID)
LEFT JOIN Table4 t4 ON (t1.GUID=t4.ParentGuid AND t5.ParentGuid=t4.Guid)
LEFT JOIN Table3 t3 ON (t1.GUID=t3.ParentGuid AND (t4.ParentGuid=t3.Guid OR t5.ParentGuid=t3.Guid))
LEFT JOIN Table2 t2 ON (t1.GUID=t2.ParentGuid AND (t3.ParentGuid=t2.Guid AND ((t4.ParentGuid=t3.Guid And t5.ParentGuid=t4.guid) or (t5.ParentGuid=t3.Guid)) OR (t4.ParentGuid=t2.Guid and t5.ParentGuid=t4.Guid) OR (t5.ParentGuid=t2.Guid)))
GROUP BY t1.GUID;
Sadly, I've been working on this for a good chunk of the weekend, and I haven't been able to figure out something that works and is reasonably performant (I had something that appeared to work okay for small datasets but took minutes against my full dataset, which is just too long - I've since lost the SQL for that though unfortunately).
I'm continuing to work on a solution now, and if I find it I will post the answer here, but I'd greatly appreciate any assistance/ideas if they're out there!
Here's my SQL Fiddle: http://sqlfiddle.com/#!5/1a2ac/55
Thanks in advance for any help.