0

We're using a "1 audit table for each monitored Table" design; However, in our case emp(PARENT) table has a child table emp_address which also needs to be monitored, so we have emp_audit and emp_addr_audit tables.

postgres audit SQL : how to join PARENT and CHILD tables for reporting purposes.

/* Employee table */    
create table emp (
 emp_id integer primary key,
 empnum  integer,
 empname varchar(50),
);

/* Address table */    
create table emp_addr (
 addr_id integer primary key,
 emp_id integer, -- references table emp
 line1 varchar(30),
);

/* Audit table for emp table */    
create table emp_audit (
 operation   character(1),
 emp_id integer,
 empnum  integer,
 empname varchar(50),
 updatetime timestamp,
 txid bigint
);

/* Audit table for emp_addr table */    
create table emp_addr_audit (
 operation   character(1),
 addr_id integer,
 emp_id integer,
 line1 varchar(30),
 updatetime timestamp,
 txid bigint
);

We're using hibernate(java) for persistence and hibernate updates only those tables whose columns were modified in the update operation. Given this, I might have multiple(say, 5) rows in the emp_addr_audit table for 1 row in emp_audit table. And vice-versa as well.

The report needs 1 row for each transaction(modification). The report will have the following columns

empname, line1, operation(insert/delete/update), updatetime

Let's consider 2 scenarios to understand what's needed:

  1. In the initial transaction only emp attributes are created. Then in a separate transaction, the corresponding row in emp_addr is created. So, now, we have 1 row in emp_audit table and 1 row in emp_addr_audit table. The report will have 2 rows (one each for each transaction).
  2. Both emp and emp_addr attributes are created in a single transaction. This will ensure that there is 1 row in emp_audit and 1 row in emp_addr_audit. Now, the report will have ONLY 1 row (since both table rows were created in a single transaction).

Scenario :
Transaction #1 : I insert a row into both emp and emp_addr. This results in a row each in emp_audit and emp_addr_audit.(INSERT)
Transaction #2 : I update the above emp' attribute. This results in a UPDATE row in emp_audit.
Transaction #3 : I update the above emp_addr's attribute. This results in a UPDATE row in emp_addr_audit.

I tried the following SQL #1 and it returned 3 rows (as expected);

SQL #1

SELECT emp.*, addr.*
 FROM  emp_audit emp 
 FULL OUTER JOIN emp_addr_audit addr USING(emp_id, txid);

However, when I added a where clause to the SQL, it returns only 2 rows. The missing row was the result of Transaction #3, where only emp_addr table row was UPDATED and emp table row was untouched.
SQL #2

SELECT emp.*, addr.*
 FROM  emp_audit emp 
        FULL OUTER JOIN emp_addr_audit addr USING(emp_id, txid);
WHERE  emp.empnum = 20;

What SQL will STILL be able to get me 3 rows for the 3 transactions so that I can still filter out based on empnum ?

anjanb
  • 12,999
  • 18
  • 77
  • 106

1 Answers1

1

The FULL JOIN will create a table from both tables and fill in empty (non matching) rows with NULL values. When you select only the rows with emp.empnum = 20 it will of course not return the last row that has NULL in emp.empnum.

Maybe you can JOIN only on emp_id and then do some GROUP BY txid. But this wont be so easy. You might have to make always an update on both tables. Or use some more advanced logic (something like: select all transaction of one emp_id where one side is null until the next full row and fill the missing parts with the last full row).

morja
  • 8,297
  • 2
  • 39
  • 59