Build a Multi Region Python App with CockroachDB, Django and K8s
A request came across my desk the other day asking whether I had any experience with Django and would I like to see if I could get this working with our multi region capabilities. I had never heard of Django before but never to be one to turn down a challenge I accepted.
First of all let's explain what Django is and what it is for.
Django is a web framework that encourages rapid development and clean, pragmatic design of Python applications. It does this by providing a proven pattern for designing scalable web applications with many features available straight out the box. This helps to prevent developers spending unnecessary time ‘reinventing the wheel’ and spend more time coding the application required. By starting a Django project you are provided with the basic layout of a web application that you can start to build upon. However this isn't a Django tutorial so we will skip forward to the topic in hand, How can we use CockroacheDB’s multi region capabilities with Django.
As you scale your usage of multi-region clusters, you may need to keep certain subsets of data in specific localities. Keeping specific data on servers in specific geographic locations is also known as data domiciling. CockroachDB has basic support for data domiciling in multi-region clusters using the ALTER DATABASE ... PLACEMENT RESTRICTED
statement.
To follow along with the blog you will need three Kuberentes clusters. Ideally these would be located in separate regions but not essential. A basic understanding of Docker and Kubernetes.
Being a Django and Python newbie I opted to use the example Django application available here here in the CockroachDB Labs documents. This is a simple application for inserting Customers, Products and Orders into a CockroachDB database.
The update multi region code for this blog can be found here
To demonstrate the Multi Region capabilities of CockroachDB I will be updating the capture Customer capability to record the cloud provider the customer should be domiciled too. For example if customer ‘Mike’ was posted from AWS then Mike’s customer record should remain on the nodes in that locality.
The technical stuff….
Several updates are required to the application to accept an additional field to record the cloud into the database. A number of changes need to be made, the first is to change the model.py file to add the additional field.
class Customers(models.Model):
id = models.UUIDField(
primary_key=True,
default=uuid.uuid4,
editable=False)
name = models.CharField(max_length=250)
cloud = models.CharField(max_length=250, null=True)
Update views.py to accept the new field.
def post(self, request, *args, **kwargs):
form_data = json.loads(request.body.decode())
name, cloud = form_data['name'], form_data['cloud']
c = Customers(name=name, cloud=cloud)
c.save()
return HttpResponse(status=200)
Change settings.py to have your database configuration.
DATABASES = {
'default': {
'ENGINE': 'django_cockroachdb',
'NAME': 'django',
'USER': 'user',
'PASSWORD': 'password',
'HOST': 'cockroachdb-public',
'PORT': '26257',
# If connecting with SSL, include the section below, replacing the
# file paths as appropriate.
'OPTIONS': {
'sslmode': 'verify-full',
'sslrootcert': '/certs/ca.crt',
# Either sslcert and sslkey (below) or PASSWORD (above) is
# required.
# 'sslcert': '/certs/client.root.crt',
# 'sslkey': '/certs/client.root.key',
},
},
}
And finally, add the additional field into the migration in the 0001_inital.py
file.
operations = [
migrations.CreateModel(
name='Customers',
fields=[
('id', models.UUIDField(default=uuid.uuid4, editable=False, primary_key=True, serialize=False)),
('name', models.CharField(max_length=250)),
('cloud', models.CharField(max_length=250, null=True)),
Now that we have an application that is ready to deploy we need to prepare our CockroachDB cluster. The first sting we need to do is to create a database for the application to consume. As my CockroachDB cluster is deployed on Kuberentes then I will deploy a secure pod with the correct certificates to connect and create a database called django.
CREATE DATABASE django;
Now that we have a database we can deploy our application into each of our regions. By doing this Django will create all the required databases tables etc. Again, as I am using Kubernetes I will just deploy the manifest that is in the git repository above. Making sure that you set the context and deploy to the correct namespace.
kubectl apply -f ./kubernetes/deployment.yaml
Once the application is deployed and the load balancer service has been created we can retrieve the external IP or Hostname in the case of AWS to post our data two. Here I have set an environment variable for each of my contexts and each of my namespaces.
az_app_ip=$(kubectl get svc django-service --context $clus1 --namespace $azregion -o json | jq -r '.status.loadBalancer.ingress[0].ip')
aws_app_ip=$(kubectl get svc django-service --context $clus2 --namespace $aws_region -o json | jq -r '.status.loadBalancer.ingress[0].hostname')
gcp_app_ip=$(kubectl get svc django-service --context $clus3 --namespace $gcp_region -o json | jq -r '.status.loadBalancer.ingress[0].ip')
Use the simple API of the application to add three entries into the Database. You will notice the second field is ‘cloud’ with a different value to indicate the cloud it was deployed into.
curl --header "Content-Type: application/json" \
--request POST \
--data '{"name":"Carl", "cloud":"azure"}' http://$az_app_ip:8000/customer/
curl --header "Content-Type: application/json" \
--request POST \
--data '{"name":"Mike", "cloud":"aws"}' http://$aws_app_ip:8000/customer/
curl --header "Content-Type: application/json" \
--request POST \
--data '{"name":"Dan", "cloud":"gcp"}' http://$gcp_app_ip:8000/customer/
Now we have some data in our django database inside CockroachDB, we can turn our attention to this and check out the multi-region stuff.
To enable this a number of steps need to be performed. First of which is to set the primary region for the database and then add the additional regions. In my case this was uksouth in Azure as the primary, then eu-west-1 in AWS and europe-west4 in GCP.
ALTER DATABASE django PRIMARY REGION "uksouth";
ALTER DATABASE django ADD REGION "eu-west-1";
ALTER DATABASE django ADD REGION "europe-west4";
For the cockroach_example_customers table we want to locate the data based on the value of the value in the cloud column. This means that the right table locality for optimizing access to the data is REGIONAL BY ROW. These statements use a CASE statement to put data for a given cloud in the right region.
ALTER TABLE cockroach_example_customers ADD COLUMN region crdb_internal_region AS (
CASE WHEN cloud = 'aws' THEN 'eu-west-1'
WHEN cloud = 'azure' THEN 'uksouth'
WHEN cloud = 'gcp' THEN 'europe-west4'
END
) STORED;
ALTER TABLE cockroach_example_customers ALTER COLUMN REGION SET NOT NULL;
ALTER TABLE cockroach_example_customers SET LOCALITY REGIONAL BY ROW AS "region";
Next, run a replication report to see which ranges are still not in compliance with your desired domiciling.
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
Next, run the query suggested in the Replication Reports documentation that should show which database and table names contain the violating_ranges.
WITH
partition_violations
AS (
SELECT
*
FROM
system.replication_constraint_stats
WHERE
violating_ranges > 0
),
report
AS (
SELECT
crdb_internal.zones.zone_id,
crdb_internal.zones.subzone_id,
target,
database_name,
table_name,
index_name,
partition_violations.type,
partition_violations.config,
partition_violations.violation_start,
partition_violations.violating_ranges
FROM
crdb_internal.zones, partition_violations
WHERE
crdb_internal.zones.zone_id
= partition_violations.zone_id
)
SELECT * FROM report;
You should see that the cockroach_example_customers table contains violating ranges. Now we can enable the placement restrictions to relocate these ranges onto the nodes in the correct locality.
ALTER DATABASE django PLACEMENT RESTRICTED;
Now that you have restricted the placement of non-voting replicas for all regional tables, you can run another replication report to see the effects. Be patient as this can take a couple of mins to have an effect. As you can imagine, it is having to move data about so this will take time and the move ranges it needs to move the longer it will take.
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
The verdict…
Being relatively new to Python and Django I found it straightforward to edit an existing application to demonstrate the multiple regional capabilities of CockroachDB. This demonstrated to me how easy it is to develop Python applications with the help of the Django framework.
Data Domiciling or pinning data to specific localities in layman's terms with CockroachDB can be really helpful to increase the performance of reads and writes. By pinning ranges to specific locations it reduces the round trip time for consensus decisions to be made reducing write latencies. An additional benefit of this capability is that by controlling the locality of your data you can conform to data sovereignty or ownership legislation. So if you are looking to create multi regional Python applications backed by a relational database the Django and CockroachDB are a good choice.
Don’t forget all the code I used in the blog is available here..