0

I have a simple data structure of 2 tables. First table holds json documents

CREATE TABLE public."TableData" (
    dataid integer,
    tableid integer,
    data jsonb,
    "addedOn" date,
    ownerid integer,
)

Here is sample of test json data in table

dataid  data
1       "{"email": "lbush@gmail.com", "items": {"qty": 24, "product": "Diaper"}, "lastname": "Bush", "firstname": "Lily"}"  "2019-10-29"    1
5       "{"city": "Boston", "state": "MA", "rating": 1, "school": "Harvard"}
12      "{"major": "English", "credits": 350, "field of study": "Aerospace Engineering"}"

The second table is used to join documents together. The TableData.dataid connects to "left_id" or "right_id" depending on the connection.

CREATE TABLE public.relations (
    relationid integer
    left_id integer,
    right_id integer,
    "addedOn" date,
    ownerid integer
)

here is 2 rows that connect the 3 documents

          relationid    left_id     _right_id   
            1               1       5   
            2               5       12

I am looking for the right query syntax that lets me connect any hierarchy of documents together such that I can query/filter and have the output show correctly. For example, here is a query that works when just linking the first two documents together.

     SELECT Contact.data -> 'firstname' as firstname,School.data -> 'school' as schoolname
      from "public"."relations" as connections 
      left join "public"."TableData" as Contact
      on Contact.dataid = connections.left_id
      left join "public"."TableData" as School
      on School.dataid = connections.right_id

This correctly outputs:

   firstname    schoolname
   "Lily"       "Harvard"

But, how do I now extend this to include additional connections? I tried the following to link the 3rd document, but not getting correct output.

    SELECT Contact.data -> 'firstname' as firstname,
    School.data -> 'school' as schoolname,
    Major.data -> 'major' as majorname
    from "public"."relations" as connections
    left join "public"."TableData" as Contact
    on Contact.dataid = connections.left_id
    left join "public"."TableData" as School
    on School.dataid = connections.right_id
    left join "public"."TableData" as Major
    on major.dataid = connections.right_id 
    left join "public"."TableData" 
    on School.dataid = connections.left_id

here is incorrect output:

        firstname   schoolname    major
        "Lily"      "Harvard"
        null         null       "Aerospace Enginering"

What I want/am looking for is:

       firstname    schoolname  major
       "Lily"       "Harvard"   "Aerspace Engineering"

How do I string multiple joins to the "Relations" table in order to get this output??

GMB
  • 216,147
  • 25
  • 84
  • 135
mike hennessy
  • 1,359
  • 1
  • 16
  • 35
  • Why do you have such a weird schema? Can't you properly normalize it so that you'd have separate tables instead of a table of json object that is related to itself? – Bergi Oct 29 '19 at 20:24
  • The goal of this schema is to support truly dynamic data structures defined by end-users. There is endless discussions on how best to support dynamic data structures, the old/best known method EAV. Great article showing the discussion. https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields I believe with json/jsob, there is a better model, but I need to support linking of data elements between them. Any ideas on how to get this query to work? – mike hennessy Oct 29 '19 at 20:31
  • I see what you did there, but why not just make your schema changeable by end-users? You may still use JSON or EAV for primitive attribute values, but as soon as you need to have links between your entities and do queries on them I would recommend to create a table per user-defined table and a table (or column) per user-defined relation, so that you can set up foreign keys properly. – Bergi Oct 29 '19 at 20:45
  • We've discussed that, but just don't think it's feasible for scalability and maintainability. Think tens of thousands of users creating hundreds of thousands of tables. – mike hennessy Oct 29 '19 at 20:46
  • Of course it also depends on how large the tables and how complex the queries on them are, but I'd wager that a purpose-built RDMS scales better to those dimensions than anything you can implement and maintain yourself. On top of that, it already brings most of the tools that you would need. – Bergi Oct 29 '19 at 20:51
  • So, just so I understand, you would advocate for creating custom table/fields dynamically as users define their table structures? The best use-case I can give is something equivalent to Airtable if you're familiar with them. They basically have a spreadsheet type of app that allows for complete customization. Do you think creating tables/fields could be supported at that scale on postgres? The tables/queries will be rather simple and small, just LOTS of them. – mike hennessy Oct 29 '19 at 20:54
  • Yes, that's what I would recommend. I mean, I haven't tried it myself yet, but I would definitely try to make a prove-of-concept with that approach and then benchmark it. Having many small tables is also much easier (and more efficient) to scale horizontally, so I'm not too concerned about postgres dealing with this. – Bergi Oct 29 '19 at 20:58

1 Answers1

0

You would need to use the relations table again to do another N:N join, and drop the anonymous TableData table:

SELECT
  Contact.data -> 'firstname' as firstname,
  School.data -> 'school' as schoolname,
  Major.data -> 'major' as majorname
FROM
  "public"."TableData" as Contact
  INNER JOIN "public"."relations" as connection1 ON (Contact.dataid = connection1.left_id)
  INNER JOIN "public"."TableData" as School ON (connection1.right_id = School.dataid)
  INNER JOIN "public"."relations" as connection2 ON (School.dataid = connection2.left_id)
  INNER JOIN "public"."TableData" as Major ON (connection2.right_id = Major.dataid)

I've used inner joins for simplicity, not sure if you want to use outer joins and how to group them.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Thx! I think there might be a typo? It complains about the "connections" in the bottom line not being defined. Is that supposed to be "as connections3 on (major.right_id-connection3.leftid) ? – mike hennessy Oct 29 '19 at 20:36
  • nm, I figutred it was INNER JOIN "public"."TableData" as Major ON (connection2.right_id = Major.dataid). that works now, thx so much! – mike hennessy Oct 29 '19 at 20:45
  • @mikehennessy Fixed the typo, thanks for mentioning! – Bergi Oct 29 '19 at 20:46