Deploying resilient Camunda against CockroachDB

The Background:

Java-based BPMN for workflow and process automation. It allows organisations to easily design and automate processes whether these are human workflows or to mange flows from microservices architecture.  

Although Camunda can be consumed as SaaS, there is also the option to deploy on customer managed hardware. Although Camunda comes with it's own lightweight  java-based H2 database.

Recently working with one of our customers there was a requirement to have a resilient, multi-cloud deployment option to protect the business against outages in any single cloud provider as well as provide the capability to scale the solution vertically and horizontally as the business grows.

Out of the box, Camunda deploys against a Java H2 database, which has many advantages being fast, simple and having a very small footprint.

Single node Camunda implementation

With this approach, however, there are significant issues with scaling CPU and memory for the solution as workload grows as there are limits to what a single node can support. In addition, the solution is fragile and vulnerable to a single failure causing the whole solution to fail.

In order to overcome these limitations, it is necessary to enable the Camunda engine to scale horizontally and place it behind a firewall. In order to avoid generating a series of separate instances this requires a single persistence layer.

Load balanced Camunda

The Problem:

The single persistence layer must be a relational SQL database. In addition,  Camunda requires READ COMMITTED transaction isolation and synchronous replication to all active cluster nodes. (aside: a basic overview of transactions isolation levels can be found here.)

While this is commonly available from local clusters, having distributed clusters operating across geographically separated cloud environments becomes difficult, complex and expensive.

Ideally the persistence layer should be able to scale vertically and horizontally with the Camunda front-ends, be resilient to outages of any single component, span multiple clouds and be able to be managed, upgraded and maintained without impacting the application operation.

The Solution:

Fortunately, the CockroachDB that does fulfil all these criteria, and has:

  • SQL Interface
  • Multi-Active read/write
  • ACID transactions
  • Serialisable Isolation
  • Multi-cloud
  • Horizontal Scaling
  • Online rolling upgrades

CockroachDB is as distributed relational database that is wire-compatible with Postgres. Because Cockroach uses well understood standardised SQL it is easy to migrate applications built for a relational database and gain resilience, as well as sale up/scale out capabilities.

Cockroach DB is used just like any other Postgres database, however it utilises a distributed SQL lay with shared-nothing key-value data for storage and the RAFT protocol to provide Serialisable ACID  transactions.

How to do it:

What we need to provide, is an architecture that has no single point of failure. This can be achieved by placing multiple camunda nodes behind a firewall, which then connect to multiple database nodes that are acting as a single logical database. These camunda and database nodes can be split between regions to provide regional resilience, or if deployed across different clouds, cloud resilience as shown:

Initially we need to set up the CockroachDB. There are two main ways to do this, the managed service at https://cockroachlabs.cloud/ (with both free and paid versions) or set up a self-managed cluster on some other infrastructure following the instructions here. If using a self-managed cluster it is important to consider the level of security required. This example will assume that a secured connection and database is desired.

Once a secure managed or self managed database has been created, create a new user for the Camunda connection. We need four items of information:

  • Username
  • Password
  • Certificate
  • Connection string

If using the managed service, the user can be create from the web gui and the password will be generated at the same time. The connection string is also available from the web GUI. The certificate can be downloaded by using the following command:

curl --create-dirs -o [location to store certificate]/root.crt -O https://cockroachlabs.cloud/clusters/[ClusterID]/cert

You can check the connection works by connecting via the cockroach client as shown:


cockroach sql --url 'postgres://[username]:[password]@[service URI]:26257/[ClusterName].defaultdb?sslmode=verify-full&sslrootcert=[location of stored certificate]/root.crt'

Once you have the database details, Camunda can be installed on the First server.

Download Camunda (Tomcat or default) from the Camunda downloads: https://camunda.com/download/

Once Camunda is untared we need to configure to use the database.  

Download the postgresql driver from https://jdbc.postgresql.org/download.html. This then needs to be placed in $camunda//configuration/userlib (normal) or the $camunda/server/apache-tomcat-9.0.52/lib directory.

Tomcat: Edit the $camunda/server/apache-tomcat-9.0.52/conf/server.xml file to include the following:

driverClassName="org.postgresql.Driver"
url="jdbc:[connection_String_including_sslmode_and_path_to_cert]"
defaultTransactionIsolation="SERIALIZABLE"
username="[username]"
password="[password]"

CE: Edit the $camunda/configuration/default.yml file to include the following:

url: jdbc:[connection_String_including_sslmode_and_path_to_cert]
driver-class-name: org.postgresql.Driver
default-transaction-isolation: SERIALIZABLE
username: [username]
password: [password]

Now start the camunda server and go to the landing page to confirm the server is working. The process of starting Camunda will also create the tables required in the database.

You can now repeat the camunda installation on servers in other regions and point these at other CockroachDB nodes in the cluster using the same username and password. By placing the Camunda nodes behind a load balancer the failure of any one node does not effect the ability for the system as a whole to process workflows.