Was this page helpful?
CDC is a feature that allows you to not only query the current state of a database’s table, but also query the history of all changes made to the table.
Note
CDC is production-ready (GA) from Scylla Enterprise 2021.1.1 and Scylla Open Source 4.3 In versions prior to Scylla Open Source 4.3, CDC is an experimental feature and you need to enable it in order to use it. See this procedure.
As an example, suppose you made a sequence of changes to some table in the given order:
UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0;
UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 0;
UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 0;
UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 1;
UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 1;
UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 1;
Normally, querying the table would return
pk | ck | v
----+----+---
0 | 0 | 2
0 | 1 | 0
(2 rows)
but with CDC, you can also learn the history of all changes:
change at 2020-01-29 14:37:32: UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0;
change at 2020-01-29 14:37:33: UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 0;
change at 2020-01-29 14:37:35: UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 0; <- latest change
change at 2020-01-29 14:37:38: UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 1;
change at 2020-01-29 14:37:39: UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 1;
change at 2020-01-29 14:37:40: UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 1; <- latest change
(not an actual syntax, the above example just presents the general concept).
Some examples where CDC may be benificial:
Heterogeneous database replication: applying captured changes to another database or table. The other database may use a different schema (or no schema at all), better suited for some specific workloads. An example is replication to ElasticSearch for efficient text searches.
Implementing a notification system.
In-flight analytics: looking for patterns in the changes in order to derive useful information, e.g. for fraud detection.
In Scylla CDC is optional and enabled on a per-table basis. The history of changes made to a CDC-enabled table is stored in a separate associated table.
Base Table - this is the original table, where all changes are made.
Log Table - this is the table associated to the base table which is created when CDC is enabled. Read about it in the log table document.
You can enable CDC when creating or altering a table using the cdc
option, for example:
CREATE TABLE ks.t (pk int, ck int, v int, PRIMARY KEY (pk, ck, v)) WITH cdc = {'enabled':true};
Note
If you enabled CDC and later decide to disable it, you need to stop all writes to the base table before issuing the ALTER TABLE ... WITH cdc = {'enabled':false};
command.
The following table contains parameters available inside the cdc
option:
Parameter name |
Definition |
Default |
---|---|---|
enabled |
If true, the log table is created and each base table write will get a corresponding log table write: a delta row. The delta row describes “what has changed”. |
false |
preimage |
If true, each base write will get a corresponding preimage row in the log table. Preimage rows exist to show the affected row’s state prior to the write. The amount of information can be changed: |
false |
postimage |
If true, each base write will get a corresponding postimage row in the log table. Postimage rows exist to show the affected row’s state after to the write. The postimage row always contains all the columns no matter if they were affected by the change or not. Note that postimages, similarly to preimages, are costly: they require an additional read-before-write. However, if you enable both preimage and postimage, only one read will be required for both of them. |
false |
delta |
If |
full |
ttl |
Each log table row has a TTL (time-to-live) set on each of its columns, so that the log doesn’t grow endlessly. This option specifies what the TTL should be in seconds; the default is 86400 seconds (24 hours). You can also set it to 0, which means that the TTL won’t be set, thus log rows won’t be removed. Be careful however: in that case the log will consume more and more disk space. You will probably want to setup a separate cleaning mechanism if you set TTL to 0. |
86400 |
When writing applications, you can now use our language specific libraries to simplify writing applications which will read from Scylla CDC. The following libraries are available:
Scylla University: Change Data Capture (CDC) lesson - Learn how to use CDC. Some of the topics covered are:
An overview of Change Data Capture, what exactly is it, what are some common use cases, what does it do, and an overview of how it works
How can that data be consumed? Different options for consuming the data changes including normal CQL, a layered approach, and integrators
How does CDC work under the hood? Covers an example of what happens in the DB on different operations to allow CDC
A summary of CDC: It’s easy to integrate and consume, it uses plain CQL tables, it’s robust, it’s replicated in the same way as normal data, it has a reasonable overhead, it does not overflow if the consumer fails to act and data is TTL’ed. The summary also includes a comparison with Cassandra, DynamoDB, and MongoDB.
Was this page helpful?