1

We're using MariaDb in production and we've added a MariaDb slave so that our data team can perform some ETL tasks from this slave to our datawarehouse. However, they lack a proper Change Data Capture feature (i.e. they want to know which rows from the production table changed since yesterday in order to query rows that actually changed).

I saw that MariaDb's 10.3 had an interesting feature that allowed to perform a SELECT on an older version of a table. However, I haven't found resources that supported the idea that it could be used for CDC, any feedback on this feature?

If not, we'll probably resort to streaming the slave's binlogs to our datawarehouse but that looks challenging..

Thanks for your help!

2 Answers2

2

(As a supplement to Stefans answer)

Yes, the System-Versioning can be used for CDC because the validity-period in ROW_START (Object starts to be valid) and ROW_END (Object is now invalid) can be interpreted when an INSERT-, UPDATE- or DELETE-query happened. But it's more cumbersome as with alternative CDC-variants.

INSERT:

  • Object was found for the first time
  • ROW_START is the insertion time

UPDATE:

  • Object wasn't found for the first time
  • ROW_START is the update time

DELETE:

  • ROW_END lies in the past
  • there is no new entry for this object in the next few lines

I'll add a picture to clarify this.

Example table for system versioning

You can see that this versioning is space saving because you can combine the information about INSERT and DELETE of an object in one line, but to check for DELETEs is costly.

In the example above I used a Table with a clear Primary Key. So a check for the-same-object is easy: just look at the id. If you want to capture changes in talbes with an key-combination this can also make the whole process more annoying.

Edit: another point is that the protocol-Data is kept in the same table as the "real" data. Maybe this is faster for an INSERT than known alternativ solution like the tracking per TRIGGER (like here), but if changes are made quite frequent on the table and you want to process/analyse the CDC-Data this can cause performance problems.

Asriel
  • 81
  • 10
1

MariaDB supports System-Versioned Tables since version 10.3.4. System version tables are specified in the SQL:2011 standard. They can be used for automatically capturing previous versions of rows. Those versions can then be queried to retrieve their values as they have been set at a specific point in time.

The following text and code example is from the official MariaDB documentation

With system-versioned tables, MariaDB Server tracks the points in time when rows change. When you update a row on these tables, it creates a new row to display as current without removing the old data. This tracking remains transparent to the application. When querying a system-versioned table, you can retrieve either the most current values for every row or the historic values available at a given point in time.

You may find this feature useful in efficiently tracking the time of changes to continuously-monitored values that do not change frequently, such as changes in temperature over the course of a year. System versioning is often useful for auditing.

With adding SYSTEM VERSIONING to a newly created or an already existing table (using ALTER), the table will be expanded by row_start and row_end time stamp columns which allow retrieving the record valid within the time between the start and the end timestamps.

CREATE TABLE accounts (
       id INT PRIMARY KEY AUTO_INCREMENT,
       name VARCHAR(255),
       amount INT
) WITH SYSTEM VERSIONING;

It is then possible to retrieve data as it was at a specific time (with SELECT * FROM accounts FOR SYSTEM_TIME AS OF '2019-06-18 11:00';), all versions within a specific time range

SELECT * FROM accounts
FOR SYSTEM_TIME
   BETWEEN (NOW() - INTERVAL 1 YEAR)
   AND NOW();

or all versions at once:

SELECT * FROM accounts
FOR SYSTEM_TIME ALL;
Stefan
  • 679
  • 2
  • 9
  • 21
  • Thank you for your answer. It looks indeed possible, your answer mentions that `row_start` and `row_end` are exposed on the new tables making it possible to run some incremental SQL queries and perform CDC. I'd appreciate any practical feedback though on the reliability of setting this up on a MariaDb slave in order to run some ETL to another datawarehouse. – MissouPower Aug 05 '20 at 05:32