Streaming Data from CockroachDB to Redpanda

Streaming Data from CockroachDB to Redpanda

Streaming Data from CockroachDB to Redpanda

This blog post will explain how to create changefeeds in CockroachDB and stream them via vectorized.io Redpanda.

Why would you want to do this?

There are a number of reasons but first its important to understand that CockroachDB is an awesome OLTP/Transaction database. It is not a great fit for heavy analytical workloads such as OLAP type use cases.
The main use case is for streaming data out of CockroachDB is to feed a BI/OLAP data store. Other use cases include using an outbox pattern or in the development of an event streaming platform.

Set up and configure CockroachDB

  1. The First step is to install and configure CockroachDB. You can use this blog post as a reference to install and configure CockroachDB in your environment of choice.

  2. As CDC is an enterprise feature you will also need to apply a license key. You can get a 30 day free trial from Cockroach Labs

  3. Once you have the key you can activate it using the below code

SET CLUSTER SETTING cluster.organization = 'xxxxx';
SET CLUSTER SETTING enterprise.license = 'xxxxxx';
  1. A alternative approach for testing this would be to use the Cockroach Demo command. This will spin up a CockroachDB cluster in memory on your laptop with a pre applied license key and a populated dataset called MOVR.
cockroach demo --nodes 3

Set up Redpanda

To run Redpanda on MacOS, we’ll use rpk to bring up Redpanda nodes in Docker containers. Make sure that you install Docker first.
Instructions for other OS's can be found here

  1. Download and install Redpanda using brew.
brew install vectorizedio/tap/redpanda
  1. Start a 3 Node Redpanda CLuster
rpk container start -n 3
  1. Cluster information can be obtained by running
rpk cluster info

Cluster_Info

Configure the Changefeeds in CockroachDB

  1. We first need to enable a CockroachDB cluster setting called RangeFeeds. Connect to your CockroachDB cluster if not already connected and run
SET CLUSTER SETTING kv.rangefeed.enabled = true;
  1. Create a Table
Create Table foo (id UUID, name string)
  1. Configure the Changedfeed in Cockroach
CREATE CHANGEFEED FOR TABLE foo INTO 'kafka://localhost:51066' with updated, resolved='5s';
  1. Tail the topics in Redpanda
rpk topic consume foo
  1. Insert some data
insert into foo values (gen_random_uuid(), 'Daniel')

SQL_Insert
6. Observe the message being sent to RedPanda
Message

Clean up

  1. Exit CockroachDB if using Cockroach Demo by typing quit in the SQL prompt
  2. Shutdown Redpanda by running rpk container purge

Conclusion

Hopefully this blog will help you get started using both CockroachDB and Redpanda in your application workflow. More information on CDC with CockroachDB can be found here and more information on Red Panda can be found here