data locality

Utilise the included demo feature of CockroachDB to discover how to geographically position your data

Paul Lovett

Feb 16, 2023 • 4 min read

Example setup, configuration and test

1. Stand up a demo multi-region cluster locally using CockroachDB Demo

$ cockroach demo --no-example-database --nodes 9 --global

2. Create a Database

CREATE DATABASE db PRIMARY REGION "us-east1" REGIONS "us-west1", "europe-west1";

USE db;

3. Create a table with Region by Row functionality

CREATE TABLE public.users (
id UUID NOT NULL DEFAULT gen_random_uuid(),
customer_name STRING NOT NULL,
CONSTRAINT "primary" PRIMARY KEY (id ASC),
UNIQUE INDEX users_customername_key (customer_name ASC)
);

4. Alter the table and add in a ‘Region’ column for RBR

SET enable_multiregion_placement_policy=on;
ALTER DATABASE db PLACEMENT RESTRICTED;
ALTER TABLE users ADD COLUMN region crdb_internal_region;
ALTER TABLE users ALTER COLUMN region SET NOT NULL;
ALTER TABLE users SET LOCALITY REGIONAL BY ROW AS region;

5. Lets see how it looks

SELECT create_statement FROM [SHOW CREATE TABLE users];

6. Lets have a closer look - We can see that the table PK and Unique Index have implicit partitions

SELECT partition_name, index_name, zone_config FROM [SHOW PARTITIONS FROM TABLE users];

7. With an implicit crdb_region column on each index

SELECT index_name, seq_in_index, column_name, implicit FROM [SHOW INDEXES FROM users];

8. Now lets look at the EXPLAIN plan for inserting users

ANALYZE users;

EXPLAIN INSERT INTO users (region, customer_name)
VALUES ('us-west1', 'sirwin'),
('us-west1', 'krudd07'),
('europe-west1', 'bojo'),
('us-east1', 'gfieri'), ('us-east1', 'kharris');

9. Lets do an actual insert

INSERT INTO users (region, customer_name)
VALUES ('us-west1', 'sirwin'),
('us-west1', 'krudd07'),
('europe-west1', 'bojo'),
('us-east1', 'gfieri'), ('us-east1', 'kharris');

10. Lets use EXPLAIN to see how the optimiser plans a locally optimised partitioned index scan of the UNIQUE INDEX on username(first run analyse to refresh statistics)

ANALYZE users;

EXPLAIN SELECT * FROM users WHERE customer_name = 'kharris";

11. Running this query is super fast as kharris is located in us-east1 the default local region of the cluster, because the data was found locally there is no need to search remote nodes

SELECT * FROM users WHERE customer_name = 'kharris';

id | customer_name | region ---------------------------------------+--------------+-----------

4fe11d28-1a35-4e88-b9b5-46197c6edadd | kharris | us-east1 (1 row)

Time: 27ms total (execution 27ms / network 0ms)

12. Running it against user sirwin is slower as that record is located in remote region us- west1

SELECT * FROM users WHERE customer_name = 'sirwin';

id | customer_name | region ---------------------------------------+--------------+-----------

d3ea0fec-c94d-46da-a4cc-9e6ecc2cf671 | sirwin | us-west1 (1 row)

Time: 86ms total (execution 86ms / network 0ms)

13. We can see from the EXPLAIN ANALYZE that we checked locally in us-east1 first, but this is a negligible latency increase as we are looking within the same region.

The DistSQL diagram (the distributed SQL plan visible by opening the URL after Diagram:) shows that less than 1ms was spent in the local lookup. We can see that sirwin is located in us-west1 as this is where we see a row count of 1

EXPLAIN ANALYZE (DISTSQL) SELECT * FROM users WHERE customer_name = 'sirwin’;

planning time: 22ms
execution time: 121ms
distribution: local
vectorized: true
rows read from KV: 1 (41 B) cumulative time spent in KV: 164ms
maximum memory usage: 60 KiB
network usage: 0 B (0 messages) regions: us-east1

• union all
│ nodes: n1
│ regions: us-east1
│ actual row count: 1
│ estimated row count: 1
│ limit: 1
│
├── • scan
│ nodes: n1
│ regions: us-east1
│ actual row count: 0
│ KV time: 13ms
│ KV contention time: 0μs
│ KV rows read: 0
│ KV bytes read: 0 B
│ estimated max memory allocated: 20 KiB
│ estimated row count: 1 (100% of the table; stats collected 10 minutes ago) │ table: users@users_customername_key
│ spans: [/'us-east1'/'sirwin' - /'us-east1'/'sirwin']
│
└── • scan

nodes: n1
regions: us-east1
actual row count: 1
estimated row count: 1 (100% of the table; stats collected 10 minutes ago)
table: users@users_customername_key
spans: [/'europe-west1'/'sirwin' - /'europe-west1'/'sirwin'] [/'us-west1'/'sirwin' - /'us-west1'/'sirwin’]

EXPLAIN ANALYZE (DISTSQL) SELECT * FROM users WHERE customer_name = 'kharris’;

Shortened for brevity:
├── • scan
│ nodes: n1
│ regions: us-east1
│ actual row count: 1
│ KV time: 8ms
│ KV contention time: 0μs
│ KV rows read: 1
│ KV bytes read: 41 B
│ estimated max memory allocated: 20 KiB
│ estimated row count: 1 (100% of the table; stats collected 4 minutes ago)
│ table: users@users_customername_key
│ spans: [/'us-east1'/'kharris' - /'us-east1'/'kharris'] │
└── • scan
nodes: n1
regions: us-east1
actual row count: 0
estimated row count: 1 (100% of the table; stats collected 4 minutes ago)
table: users@users_customername_key
spans: [/'europe-west1'/'kharris' - /'europe-west1'/'kharris'] [/'us-west1'/'kharris' - /'us- west1'/'kharris']

Sign up for more like this.