[Version française disponible ici]
Recently, CockroachLabs launched CockroachDB 22.2 with a lot of new features.
To name a few, you'll find User Defined Functions, Time-To-Live at row level, and CDC (Change Data Capture) with transformations and filtering. Any of these features is great improvement, but can we get more by combining all of them?
Example use case
The use case we'll discuss will be an easy to understand one, but it'll demonstrate the power of all the recent improvements.
We start with an example you can find multiple times in the doc, transferring funds between accounts. We won't simply transfer funds. We now want to integrate a fraud detection mechanism (don't expect to run a fintech with this kind of fraud detection!).
Here is the schema to use to run this example.
We create a table for accounts and transfers. The interesting part is on anomalies and blocked_accounts. For those table we use the TTL feature. On anomalies, we keep track of the anomaly level (more on this later) of every transfer for 1 minute.
The blocked_accounts will help in blocking transfer if the source account seems suspicious. We remove lines every 10 minutes.
For every transfer done, we want to emit a change with transfer id, source account, and destination account. This will be done by using a CDC transformations like this:
The CDC stream will send every transfer higher than 100 with only the id, source and destination.
Before creating the CDC, don't forget to enable KV rangefeed in your cluster:
We also use a User Defined Function to "compute" the anomaly level:
The function will use a simple logic to determine anomaly level based the amount of the transfers.
Now we've setup all those features and everything is ready to setup, we can now run the logic.
Our program will do random transfers. It randomly selects 2 accounts, a random amount, and finally try to achieve the transfer.
The transfer will check the available balance of the account and if the account is in the blocked list.
IMPORTANT: don't forget the CDC is running on your cluster. So don't use "localhost" when running CockroachDB in Serverless or Dedicated. Use a proper server name your endpoint could be reached on.
The code also starts a web server to receive the CDC stream (remember the localhost in the CDC creation part). For every transfers done, we receive the id, source and destination from the transfers table if, and only if, the amount is higher than 100.
From there, we run the logic we want.
We call our UDF (User Defined Function), to get the anomaly level of this transfer. After we add this level into the anomalies table to be able to track all "suspicious" transfer from this account. Remember, with TTL, an anomaly will remain in the table only for 1 minute.
Then we compute a risk rate of the account to determine if we need to insert it in the blocked accounts list:
The rate is simply computed by counting transfers flagged as warning and 5 times transfers flagged as alert. This is where the TTL is really important as you don't want to keep all those anomaly level to prevent blocking an account indefinitely!
As mentioned, this is an easy example without a real logic for fraud detection. But the idea is here. CDC, UDF, and TTL are great features and even better when working together.
You can imagine a lot more by following those easy steps like, rate-limiting feature, enabling parameters computation on near real-time for ML algorithms, etc.
Event-Driven design is now also easier to implement.
You can have a look at the source code used here.