0

We have small array of gpdb and pivotal hadoop.We are trying to do CDC Using gpdb.

WE are using SQL server 2012 as external Data source. we have read only access of SQL SERVER from client's END. Hence ,SQL server does not allow to capture CDC mechanism for Data Integration tool.

We have to Implement CDC in greenplum. So that it brings only changed row Instead of Full Load every time. which also burdens SQL server on every schedules of Batch Job.

We tried on Following SQL

This is complete CDC query which decides action of each row

select * from (select *,case when ( a.ID is null) then 'I'
when ( a.ID is not null and  a.SalesOrder = a.SalesOrder and  a.SalesOrderLine = a.SalesOrderLine) then 'N'
when ( a.ID is not null and ( a.SalesOrder != a.SalesOrder or  a.SalesOrderLine != a.SalesOrderLine)) then 'U'
else 'X'
end as increment_action from tablename)

We also tried to understand using Link:

Similarly, I have tried on around dozen of link from Talend. But it is asking for pub/sub mechanism Which only supports when SQL server CDC feature is on. Hence Unable to do it from talend's way.

https://www.pivotalguru.com/?page_id=35

Is there any way to Implement Incremental Load Instead of Full Load in gpdb USING SQL. AND is there any way to convert SQL SERVER DDL syntax into Greenplum.?

Is there step by step documentation or guidance to implement CDC in gpdb ?

Any help on it would be much appreciated ?

NEO
  • 389
  • 8
  • 31

1 Answers1

2

Outsourcer does CDC and it is open source. You can either use Outsourcer or review the code to see how it is done and recreate it with your own code.

Outsourcer Replication

Source Code

Jon Roberts
  • 2,068
  • 1
  • 9
  • 11
  • it would be great mechanism . Does it work also with hawq ?. I have also searched out but is there any real time demo. of CDC using HAWQ/GREENPLUM. – NEO Jun 07 '16 at 03:53
  • Outsourcer works with HAWQ but replication/CDC does not. CDC requires the ability to DELETE from a table which is not supported in HAWQ because of HDFS. I will put together a quick demo of Outsourcer Replication soon. – Jon Roberts Jun 07 '16 at 14:45
  • Jon it is incredible thing . I mean what you have done is very helpful for the person who is using full load and wanted to shift for Incremental LOAD. Thanks for it . – NEO Jun 07 '16 at 15:23
  • @vickps thanks! I just created a demo of using Outsourcer Replication and it is now available on my site. http://www.pivotalguru.com – Jon Roberts Jun 07 '16 at 18:21
  • please on chat section – NEO Jun 07 '16 at 19:11