Active Passive vs Multi-Active Database Topologies
A Comparison of Active Passive vs Multi-Active Database Deployments
There are pros and cons to each type of database deployment, Multi Active or Active Passive. There are a couple of things to consider when designing a resilient database architecture. and in this blog we will outline these.
The most basic deployment is a single site single node architecture. This does not give you anything in terms of business continuity. It provides no High availability and the only DR mechanism is to restore your database form a backup file. This type of deployment is normally seen in less critical environments such as development or used in CI/CD Pipelining technologies when testing is automated as part of that process. Nearly all databases can be deployed in this manner including CockroachDB, Oracle, SQL Server etc
The Benefits of this model are
- Cost Effective as only one node is licensed. However the cost of an outage to the business if running this model in production could be astronomical in terms of lost revenue and productivity.
The Cons of this model are
- Lack of HA. If the node goes down or has issues there is no failover. You have to fix the existing node or restore from a backup
- Any maintenance that could cause downtime has to be factored in around slow traffic times but there will always be some form of customer/service impact when completing patching or upgrades etc.
The next level up from single node is Single Site Multi Node, This gives you more in terms of DR and can also give a little bit of HA depending on the technologies used. In this deployment there are typically 2 or more nodes involved in either an active passive model or a multi active model. The nodes are typically spread across different failure domains such as Rack, Network Switches and Disks.
Active Passive Model
In this model there is a single master node and x number of passive nodes. This means that if there is an issue with the master node the application can be pointed at a secondary node and the secondary node can be promoted as a master, This is typically automatic but does involve downtime as the application repoints to the new node. This makes recovering from failure a lot faster than a single node architecture but still not a perfect solution for production deployments. Examples of databases that run in active passive configurations are Oracle, SQL Server, MY SQL and Postgres.
The Pros of the Active Passive Model Are
- Still relatively cost effective as some providers allow you to run a passive node for no charge providing you have an active support contract
- Provides a lot better HA capability than single node architecture
The Cons of the Active Passive Model are
- Can get expensive if you have to license the secondary node(s) as you are paying for hardware resources that are not used. The only time they are used is in the event of a disaster or failure.
- Expensive in terms of operational cost as in the event of a failure all the servers need to be re-synced in order to get back to an active passive configuration.
In this model all the nodes in the cluster are available for read and write operations. There is no concept of a master and secondary nodes as all nodes are equal within a multi active cluster. This gives you a lot of benefits and control over HA and DR capabilities. It also has the inherent ability of allowing ease of scalability. Databases that can be deployed in this category include CockroachDB, Cassandra and Couchbase.
The Pros of the Multi Active Model are
- Scalability of both Read and Write operations
- Always on availability meaning no downtime when completing maintenance tasks like upgrades and patching.
- Cost effective in terms of resource utilisation as all nodes are actively used all of the time. This makes these solution to be some of the most cost effective solutions on the market although a higher up front licensing cost may be observed.
- RPO of 0 and RTO < 10 seconds
The Cons of the multi active model are
- Most of these solutions will have a performance hit due to there being some network traffic involved. Although one database in particular is better than other in this case. CockroachDB can help reduce network latency with its Geo Partitioning feature. This is usually minimal in the single site model due to the network being super quick with lots of bandwidth.
- Some of the technologies like Cassandra need regular maintenance jobs ie recovery operations to be completed to ensure the data on all nodes is replicated and consistent.
The Overall Con for the Single Site Deployment is there is no coverage for a whole site/regional Outage. If this is a requirement then a Multi Site Deployment is a more suitable model.
The Multi-Site Model dictates that the nodes are spread across different sites or regions. This is important if the survivability criteria is one of losing a site or region. Multi Site Deployments single biggest advantage over a Single Site Deployment is you can survive the outage of a region/Data Centre/Site.
A lot of the pros and cons regarding Multi Active vs Active Passive are the same in terms of the Single site. However more consideration should be put on the below points.
Network Latency - As sites are normally geo Geographically dispersed for resiliency, Network latency normally plays a part in the response time of applications. Some Databases like CockroachDB allow you to control a portion of this latency using its Geo-Partitioning feature. You can read more of that here. This is normally the price you have to pay for regional or data centre resiliency.
DR - Backups have to be taken from all nodes within the cluster not just a single node as in the active passive scenario.
The best database in this category, in my opinion, is by far CockroachDB if you require consistency, scalability and performance.
The other solutions are still solid options depending on your application requirements and needs.
As you can see from the pros and cons of each solution, I hope this will help you decide on the correct deployment for you applications. Whichever solution you pick needs to meet your requirements in terms Business Continuity(HA and DR),Cost (Not only in terms of $ but also in terms of operational and outage cost. A full TCO model should be considered) and performance needs.