CockroachDB is a geo-distributed, cloud-native SQL database, crafted from the ground up (in Go) to scale reads and writes both vertically and horizontally on commodity hardware, in topologies spanning both regional, datacenter and cloud vendor boundaries.
It’s based on a multi-leader, shared-nothing architecture, where each node in a cluster is a symmetrical gateway to the entire database and equally capable of serving both reads and writes regardless of deployment topology (single vs multi-region).
Data distribution is a fundamental property of CockroachDB and as such, it uses different mechanisms to operate across a network of machines over potentially volatile networks and ephemeral hosting environments. As an example, it bootstraps it's replication and consistency model from Raft and hybrid logical locks rather than a single leader node WAL/storage and single clock source. These mechanisms and the networking complexity adds some overhead in comparison to a single-leader database architecture, which makes all such comparisons skewed. The recommendation is to run at least 3 nodes even in a local DC/rack setup. This overhead is however quickly overcome with an increased vcpu and node count, enabling CockroachDB to scale to very high QPS numbers. A single-leader DBMS system can only scale vertically and is unable to scale writes to multiple machines.
For a deep-dive into the architectural innovations of CockroachDB, I suggest reading our SIGMOD 2020 research paper.
SQL Server has a capable single node DB architecture dating back decades similar to Oracle, MySQL and PostgreSQL. However, all these traditional monolithic database systems have fundamentally different architectures and shortcomings making them unfit for purpose for the cloud and global use cases. Even when these systems are packaged into cloud service offerings, these shortcomings don't go away. These traditional SQL systems were crafted in different time pre-cloud era’s and are addressing different business problems than what CockroachDB was crafted for. A blog post from CEO and co-founder of Cockroach Labs, Spencer Kimball, from 2018 describes this in more detail in Your business will evolve. Your database shouldn’t have to.
What do I mean by using the inherent benefits of the cloud? It’s simple really: cloud-native databases can leverage the ability to quickly schedule resources within a facility, but also across facilities, cloud service providers, and even continents. This can allow them to provide scale, unmatched resilience, low-latency global operation, and data sovereignty compliance. Monolithic databases remain useful pieces of technology, but because they require scaling up just one master node, they are evolutionary dead ends. They are the products of a smaller, less-connected era, and their shortcomings risk becoming liabilities for your business as it evolves.
SQL Server is fundamentally architected to be operated on a single node in a local network. It can scale out reads across data centers or availability zones (but not writes) by replicating from the single leader to multiple followers (similar to AWS Aurora). It’s a fairly typical deployment model for any traditional SQL database that enables scaling low latency reads, but with caveats in terms of operational overhead and weaker data safety guarantees. SQL Server’s single-node architecture can present significant issues for availability, resiliency and scale for some workloads. There is no capability to write within multiple nodes or regions, and you are limited to scale the database for read-only access by replicating asynchronously to additional regions. There’s also no data locality capabilities and serving stale reads from followers may not be a good fit for many application workloads.
Things to be aware of with SQL Server
- Only operates on a single node with a scalability wall on reads and writes.
- Cross DC/region operation involves async database snapshots and replication between a single leader and followers.
- Only the leader node is consistent but defaults to READ COMMITTED exposing applications to anomalies.
- Downtime causing maintenance window for certain upgrades like OS patches, DB upgrades, and instance upgrades
- If AZ containing the leader node goes down, failover by promoting a follower node takes undefined seconds (RTO) with undefined data loss (RPO).
Things to be aware of with CockroachDB
- Horizontally scalable, geo-distributed SQL database where all nodes perform reads and writes (multi-leader)
- CockroachDB has 100x the throughput of Aurora on TPC-C 100k (for reference)
Crafted for on-prem, hybrid and cross-cloud deployments
Keep read-write capable database nodes close to apps with multi-region deployments (think CDNs);
- Allows scaling of reads and writes horizontally, while reducing latency with geo-partitioning.
Granular and configurable controls of replication and data location.
Distributed SQL engine where the entire cluster participates in queries.
Custom-built cost-based optimizer that takes full advantage of the distributed nature of the database
High availability built-in:
- No SPOFs
- Can operate without maintenance windows even when scaling up instances, patching operating systems, or making database updates.
- Zero RPO guaranteed, up to 9 sec RTO possible
Serializable isolation level - always
Natively built-in change data capture for data integration
Open Source and Enterprise version