Utilise the included demo feature of CockroachDB to discover how to geographically position your data
Example setup, configuration and test
1. Stand up a demo multi-region cluster locally using CockroachDB Demo
$ cockroach demo --no-example-database --nodes 9 --global
2. Create a Database
CREATE DATABASE db PRIMARY REGION "us-east1" REGIONS "us-west1", "europe-west1";
USE db;
3. Create a table with Region by Row functionality
CREATE TABLE public.users (
id UUID NOT NULL DEFAULT gen_random_uuid(),
customer_name STRING NOT NULL,
CONSTRAINT "primary" PRIMARY KEY (id ASC),
UNIQUE INDEX users_customername_key (customer_name ASC)
);
4. Alter the table and add in a ‘Region’ column for RBR
SET enable_multiregion_placement_policy=on;
ALTER DATABASE db PLACEMENT RESTRICTED;
ALTER TABLE users ADD COLUMN region crdb_internal_region;
ALTER TABLE users ALTER COLUMN region SET NOT NULL;
ALTER TABLE users SET LOCALITY REGIONAL BY ROW AS region;
5. Lets see how it looks
SELECT create_statement FROM [SHOW CREATE TABLE users];
6. Lets have a closer look - We can see that the table PK and Unique Index have implicit partitions
SELECT partition_name, index_name, zone_config FROM [SHOW PARTITIONS FROM TABLE users];
7. With an implicit crdb_region column on each index
SELECT index_name, seq_in_index, column_name, implicit FROM [SHOW INDEXES FROM users];
8. Now lets look at the EXPLAIN plan for inserting users
ANALYZE users;
EXPLAIN INSERT INTO users (region, customer_name)
VALUES ('us-west1', 'sirwin'),
('us-west1', 'krudd07'),
('europe-west1', 'bojo'),
('us-east1', 'gfieri'), ('us-east1', 'kharris');
9. Lets do an actual insert
INSERT INTO users (region, customer_name)
VALUES ('us-west1', 'sirwin'),
('us-west1', 'krudd07'),
('europe-west1', 'bojo'),
('us-east1', 'gfieri'), ('us-east1', 'kharris');
10. Lets use EXPLAIN to see how the optimiser plans a locally optimised partitioned index scan of the UNIQUE INDEX on username(first run analyse to refresh statistics)
ANALYZE users;
EXPLAIN SELECT * FROM users WHERE customer_name = 'kharris";
11. Running this query is super fast as kharris is located in us-east1 the default local region of the cluster, because the data was found locally there is no need to search remote nodes
SELECT * FROM users WHERE customer_name = 'kharris';
id | customer_name | region ---------------------------------------+--------------+-----------
4fe11d28-1a35-4e88-b9b5-46197c6edadd | kharris | us-east1 (1 row)
Time: 27ms total (execution 27ms / network 0ms)
12. Running it against user sirwin is slower as that record is located in remote region us- west1
SELECT * FROM users WHERE customer_name = 'sirwin';
id | customer_name | region ---------------------------------------+--------------+-----------
d3ea0fec-c94d-46da-a4cc-9e6ecc2cf671 | sirwin | us-west1 (1 row)
Time: 86ms total (execution 86ms / network 0ms)
13. We can see from the EXPLAIN ANALYZE that we checked locally in us-east1 first, but this is a negligible latency increase as we are looking within the same region.
The DistSQL diagram (the distributed SQL plan visible by opening the URL after Diagram:) shows that less than 1ms was spent in the local lookup. We can see that sirwin is located in us-west1 as this is where we see a row count of 1
EXPLAIN ANALYZE (DISTSQL) SELECT * FROM users WHERE customer_name = 'sirwin’;
planning time: 22ms
execution time: 121ms
distribution: local
vectorized: true
rows read from KV: 1 (41 B) cumulative time spent in KV: 164ms
maximum memory usage: 60 KiB
network usage: 0 B (0 messages) regions: us-east1
• union all
│ nodes: n1
│ regions: us-east1
│ actual row count: 1
│ estimated row count: 1
│ limit: 1
│
├── • scan
│ nodes: n1
│ regions: us-east1
│ actual row count: 0
│ KV time: 13ms
│ KV contention time: 0μs
│ KV rows read: 0
│ KV bytes read: 0 B
│ estimated max memory allocated: 20 KiB
│ estimated row count: 1 (100% of the table; stats collected 10 minutes ago) │ table: users@users_customername_key
│ spans: [/'us-east1'/'sirwin' - /'us-east1'/'sirwin']
│
└── • scan
nodes: n1
regions: us-east1
actual row count: 1
estimated row count: 1 (100% of the table; stats collected 10 minutes ago)
table: users@users_customername_key
spans: [/'europe-west1'/'sirwin' - /'europe-west1'/'sirwin'] [/'us-west1'/'sirwin' - /'us-west1'/'sirwin’]
EXPLAIN ANALYZE (DISTSQL) SELECT * FROM users WHERE customer_name = 'kharris’;
Shortened for brevity:
├── • scan
│ nodes: n1
│ regions: us-east1
│ actual row count: 1
│ KV time: 8ms
│ KV contention time: 0μs
│ KV rows read: 1
│ KV bytes read: 41 B
│ estimated max memory allocated: 20 KiB
│ estimated row count: 1 (100% of the table; stats collected 4 minutes ago)
│ table: users@users_customername_key
│ spans: [/'us-east1'/'kharris' - /'us-east1'/'kharris'] │
└── • scan
nodes: n1
regions: us-east1
actual row count: 0
estimated row count: 1 (100% of the table; stats collected 4 minutes ago)
table: users@users_customername_key
spans: [/'europe-west1'/'kharris' - /'europe-west1'/'kharris'] [/'us-west1'/'kharris' - /'us- west1'/'kharris']