Utilise the included demo feature of CockroachDB to discover how to geographically position your data

Utilise the included demo feature of CockroachDB to discover how to geographically position your data


Example setup, configuration and test

1. Stand up a demo multi-region cluster locally using CockroachDB Demo

$ cockroach demo --no-example-database --nodes 9 --global


2. Create a Database

CREATE DATABASE db PRIMARY REGION "us-east1" REGIONS "us-west1", "europe-west1";

USE db;


3. Create a table with Region by Row functionality

CREATE TABLE public.users (
id UUID NOT NULL DEFAULT gen_random_uuid(),
customer_name STRING NOT NULL,
CONSTRAINT "primary" PRIMARY KEY (id ASC),
UNIQUE INDEX users_customername_key (customer_name ASC)
);


4. Alter the table and add in a ‘Region’ column for RBR

SET enable_multiregion_placement_policy=on;
ALTER DATABASE db PLACEMENT RESTRICTED;
ALTER TABLE users ADD COLUMN region crdb_internal_region;
ALTER TABLE users ALTER COLUMN region SET NOT NULL;
ALTER TABLE users SET LOCALITY REGIONAL BY ROW AS region;


5. Lets see how it looks

SELECT create_statement FROM [SHOW CREATE TABLE users];


6. Lets have a closer look - We can see that the table PK and Unique Index have implicit partitions

SELECT partition_name, index_name, zone_config FROM [SHOW PARTITIONS FROM TABLE users];


7. With an implicit crdb_region column on each index

SELECT index_name, seq_in_index, column_name, implicit FROM [SHOW INDEXES FROM users];


8. Now lets look at the EXPLAIN plan for inserting users

ANALYZE users;

EXPLAIN INSERT INTO users (region, customer_name)
VALUES ('us-west1', 'sirwin'),
('us-west1', 'krudd07'),
('europe-west1', 'bojo'),
('us-east1', 'gfieri'), ('us-east1', 'kharris');



9. Lets do an actual insert

INSERT INTO users (region, customer_name)
VALUES ('us-west1', 'sirwin'),
('us-west1', 'krudd07'),
('europe-west1', 'bojo'),
('us-east1', 'gfieri'), ('us-east1', 'kharris');


10. Lets use EXPLAIN to see how the optimiser plans a locally optimised partitioned index scan of the UNIQUE INDEX on username(first run analyse to refresh statistics)

ANALYZE users;

EXPLAIN SELECT * FROM users WHERE customer_name = 'kharris";


11. Running this query is super fast as kharris is located in us-east1 the default local region of the cluster, because the data was found locally there is no need to search remote nodes

SELECT * FROM users WHERE customer_name = 'kharris';

id | customer_name | region ---------------------------------------+--------------+-----------

4fe11d28-1a35-4e88-b9b5-46197c6edadd | kharris | us-east1 (1 row)

Time: 27ms total (execution 27ms / network 0ms)


12. Running it against user sirwin is slower as that record is located in remote region us- west1

SELECT * FROM users WHERE customer_name = 'sirwin';

id | customer_name | region ---------------------------------------+--------------+-----------

d3ea0fec-c94d-46da-a4cc-9e6ecc2cf671 | sirwin | us-west1 (1 row)

Time: 86ms total (execution 86ms / network 0ms)


13. We can see from the EXPLAIN ANALYZE that we checked locally in us-east1 first, but this is a negligible latency increase as we are looking within the same region.

The DistSQL diagram (the distributed SQL plan visible by opening the URL after Diagram:) shows that less than 1ms was spent in the local lookup. We can see that sirwin is located in us-west1 as this is where we see a row count of 1

EXPLAIN ANALYZE (DISTSQL) SELECT * FROM users WHERE customer_name = 'sirwin’;

planning time: 22ms
execution time: 121ms
distribution: local
vectorized: true
rows read from KV: 1 (41 B) cumulative time spent in KV: 164ms
maximum memory usage: 60 KiB
network usage: 0 B (0 messages) regions: us-east1

• union all
│ nodes: n1
│ regions: us-east1
│ actual row count: 1
│ estimated row count: 1
│ limit: 1

├── • scan
│ nodes: n1
│ regions: us-east1
actual row count: 0
│ KV time: 13ms
│ KV contention time: 0μs
│ KV rows read: 0
│ KV bytes read: 0 B
│ estimated max memory allocated: 20 KiB
│ estimated row count: 1 (100% of the table; stats collected 10 minutes ago) │ table: users@users_customername_key
│ spans: [/'us-east1'/'sirwin' - /'us-east1'/'sirwin']

└── • scan

nodes: n1
regions: us-east1
actual row count: 1
estimated row count: 1 (100% of the table; stats collected 10 minutes ago)
table: users@users_customername_key
spans: [/'europe-west1'/'sirwin' - /'europe-west1'/'sirwin'] [/'us-west1'/'sirwin' - /'us-west1'/'sirwin’]


EXPLAIN ANALYZE (DISTSQL) SELECT * FROM users WHERE customer_name = 'kharris’;

Shortened for brevity:
├── • scan
│  nodes: n1
│  regions: us-east1
│  actual row count: 1
│  KV time: 8ms
│  KV contention time: 0μs
│  KV rows read: 1
│  KV bytes read: 41 B
│  estimated max memory allocated: 20 KiB
│  estimated row count: 1 (100% of the table; stats collected 4 minutes ago)
│  table: users@users_customername_key
│  spans: [/'us-east1'/'kharris' - /'us-east1'/'kharris'] │
└── • scan
nodes: n1
regions: us-east1
actual row count: 0
estimated row count: 1 (100% of the table; stats collected 4 minutes ago)
table: users@users_customername_key
spans: [/'europe-west1'/'kharris' - /'europe-west1'/'kharris'] [/'us-west1'/'kharris' - /'us- west1'/'kharris']