2

We are currently working on a project where we need to check if the database schema has changed everytime we start a Spoon job, since our origin is a third party database that we have little to no control.

The most obvious solution to us would be to create a script that would call a tool like apgdiff and then compare the schema to a previous generated schema file. If there was any change, we would then send a notification.

The question is basically: is this the best way to achieve this?

Any help would be appreciated.

Thanks for your time.

P.S.: I'm not sure whether stackoverflow is the best place for this kind of question so, if not, please feel free to suggest any interesting forum.

Isabelle
  • 631
  • 1
  • 7
  • 14

1 Answers1

1

Solution I: Assuming it is a PostgreSQL database you are referring to, if you have sufficient privileges to INFORMATION_SCHEMA, I would suggest that you query the database like so:

select column_name, data_type, character_maximum_length
from INFORMATION_SCHEMA.COLUMNS where table_name = '<name of table>';

Store the expected result in a persistent way, like you mentioned, and then just compare the results in a sub-transformation. The persistent schema could be a CSV file that stores the definition like so:

app_id           character varying           255                      
platform         character varying           255                      
etl_tstamp       timestamp without time zone (null)                   
collector_tstamp timestamp without time zone (null)                   
dvce_tstamp      timestamp without time zone (null)                   
event            character varying           128                      
event_id         character                   36                       

Then just simply compare the two files: (1) the file that holds the expected schema definition and (2) the file that you just generated, fresh from the database. You could use the File Compare step to do so:

Example - PDI

I hope this helps a bit.

EDIT:

Solution II: Another solution you could apply: you can also use the Table Compare step (contributed by www.kjube.de) to compare two tables from different sources.

What's nice about this step is that you can specify two different connections for the two tables you are comparing.

Yuval Herziger
  • 1,145
  • 2
  • 16
  • 28