Beginners Guide to JSONB in CockroachDB

In this blog we will cover the fundamentals of using JSONB in cockroach and some of the common ways to access and match data. This will cover:

  • Creating tables
  • Populating data
  • Select data and subsets of data
  • Filtering data
  • Updating data
  • Matching between tables
  • Performance tuning
  • How to making life easier for ourselves.

What is CockroachDB?

CockroachDB a SQL database, built to be cloud-native and provide class-leading high availability, scalability, and data consistency. It uses a distributed architecture that ensures that data is always available even in the event of hardware, cloud or network failures.

CockroachDB is also SQL-compliant, meaning that developers can use familiar SQL commands to interact with the database. It is designed to support serialisable isolation transactions with immediate consistency, which ensures that data is always consistent and reliable.

What is JSONB?

JSONB is a data type in relational databases which is used to store JSON (JavaScript Object Notation) data in a more efficient way than the traditional JSON data type.

JSONB allows developers to store complex and nested data structures in a single column, making it ideal for applications that deal with semi-structured data. It also supports the ability to query and manipulate JSON data using SQL commands, making it easy to work with for developers who are already familiar with SQL.

How does CockroachDB implement JSONB?

CockroachDB supports JSONB as a SQL data type, which means that it can store JSON data in a binary format in a table column. When data is stored in a JSONB column, it is automatically validated to ensure that it conforms to the JSON standard.

CockroachDB provides several functions and operators that can be used to query and manipulate JSONB data. These include functions for querying JSONB data, documented here.

CockroachDB also provides support for indexing JSONB data, which can improve query performance. JSONB data can be indexed using a GIN (Generalized Inverted Index).

CockroachDB and JSONB, step by step.

We will consider a very basic application for a bookshop with 3 tables:

  • Stock
  • Customers
  • Orders

Each table will store data as schema on read utilising the JSONB datatype.

Create tables:

CREATE TABLE public.stock (id UUID primary key default gen_random_uuid()
                          ,item JSONB);

Import data:

Run the following file: populate_stock.sql

Simple Selects:

select * from stock;

As with standard SQL this gives us the raw data stored in the table.

We can extract specific values from JSONB with the “->” (JSON) and “->>” (String) operators as so:

select item-'series' from stock;

Output:

We can also select pieces of information from the array elements within the JSON date:

select item->'books'->0->'ISBN' from stock;

Output:

We can also use CockroachDBs built-in functions to present the data in other formats:

select item->'series', jsonb_array_elements(item->'books')->'ISBN' from stock;

Output:

Filter results

As this is a SQL table, we can use statements to limit the output. In this example we are bringing back just those books in the “Sharpe” series and including the row number in the output.

select row_number() over() as count
      ,item->'series' as series
      ,jsonb_array_elements(item->'books')->'ISBN' as ISBN
      ,jsonb_array_elements(item->'books')->>'title' as title
      ,jsonb_array_elements(item->'books')->'price' as price
  from stock
 where item->>'series' = 'Sharpe';

Output:

Simple update

We can also update the JSONB data, this can be done by overwriting the whole row, or using the jsonb_set() to overwrite a specific element of the data.

update stock 
   set item = '{"books": [{"ISBN": "978-0140119925", "author": "Bernard Cornwell", "copies": 9, "genre": "Historical Fiction", "pages": 383, "price": 2.99, "publisher": "Penguin", "title": "Sharpes Eagle"}, {"ISBN": "978-0140119932", "author": "Bernard Cornwell", "copies": 12, "genre": "Historical Fiction", "pages": 384, "price": 9.99, "publisher": "Penguin", "title": "Sharpes Gold"}, {"ISBN": "978-0143034936", "author": "Bernard Cornwell", "copies": 23, "genre": "Historical Fiction", "pages": 352, "price": 11.99, "publisher": "Penguin", "title": "Sharpes Escape"}, {"ISBN": "978-0060082736", "author": "Bernard Cornwell", "copies": 15, "genre": "Historical Fiction", "pages": 400, "price": 12.99, "publisher": "HarperCollins", "title": "Sharpes Fury"}, {"ISBN": "978-0061097659", "author": "Bernard Cornwell", "copies": 15, "genre": "Historical Fiction", "pages": 352, "price": 10.99, "publisher": "HarperCollins", "title": "Sharpes Battle"}, {"ISBN": "978-0061099356", "author": "Bernard Cornwell", "copies": 42, "genre": "Historical Fiction", "pages": 352, "price": 9.99, "publisher": "HarperCollins", "title": "Sharpes Company"}, {"ISBN": "978-0061098625", "author": "Bernard Cornwell", "copies": 2, "genre": "Historical Fiction", "pages": 288, "price": 11.99, "publisher": "HarperCollins", "title": "Sharpes Sword"}, {"ISBN": "978-0060088776", "author": "Bernard Cornwell", "copies": 17, "genre": "Historical Fiction", "pages": 320, "price": 13.99, "publisher": "HarperCollins", "title": "Sharpes Havoc"}, {"ISBN": "978-0061097642", "author": "Bernard Cornwell", "copies": 7, "genre": "Historical Fiction", "pages": 288, "price": 10.99, "publisher": "HarperCollins", "title": "Sharpes Rifles"}], "series": "Sharpe"}' 
 where id = 'f65d4632-6f3e-47bf-a888-3fbb918d5567';

However, this is cumbersome, a more elegant way of doing this is to use the jsonb_set() function:

update stock set item = jsonb_set(item, '{books, 2, "price"}', '5.49') where item->>'series' = 'Sharpe';

Matching tables

Additionally we can use the relational nature of the database to extract information across tables.


Create the underlying tables and data with the following script: populate_cust_ord.sql

We can now verify the data within the tables:

select order_details->>'customer_name'
      ,order_details->>'customer_email' 
  from orders;

Output:

select customer_profile->'credit_card'->>'cardholder_name'
      ,customer_profile->>'email' 
  from customers;

Output:

We now have two tables where there is a relationship between data within the JSONB column. We can effectively join on these columns using the following:

select customer_profile-'billing_address' as billing_address
      ,order_details-'shipping_address' as shipping_address
  from customers
      ,orders 
 where order_details->>'customer_email' = customer_profile->>'email';

Output:

This enables us to use semi-structured data within the database as if it was structured data with relationships.

JSONB performance

The performance of JSONB can be improved by using General Inverted indexes. A Generalized Inverted Index (GIN) is a type of index used in databases to efficiently search and retrieve data from complex data types such as JSON.

In a GIN, the values of a particular column are analyzed and indexed, and the resulting index contains a list of all the values and their corresponding row identifiers. This allows the database to quickly find all the rows that contain a particular value or set of values, even if those values are nested within more complex data structures.

For example, if a GIN is created on a column that contains JSON data, the index can be used to efficiently search for specific keys or values within the JSON data, even if they are deeply nested within the JSON object.

Using the “explain analyze” feature, we can see how CockroachDB retrieves the data without indexes:

SELECT A:

explain analyze 
select order_details-'shipping_address' as shipping_address 
  from orders 
 where order_details->'customer_email' = '"lisakim@example.com"';

SELECT B:

explain analyze 
select customer_profile-'billing_address' as billing_address 
  from customers 
 where customer_profile->'email' = '"lisakim@example.com"';

SELECT C:

explain analyze 
select customer_profile-'billing_address' as billing_address
      ,order_details-'shipping_address' as shipping_address 
  from orders
      ,customers
 where order_details->'customer_email' = customer_profile->'email' 
   and customer_profile->'email' = '"lisakim@example.com"' 
   and order_details->'customer_email' = '"lisakim@example.com"';


We now create inverted indexes on the two tables:

create inverted index on customers(customer_profile);
create inverted index on orders(order_details);

And run the “explain analyse” SQL commands  again:

SELECT A:

SELECT B:

SELECT C:

As seen in the table below, using GINs reduces the amount of rows needed to be read from the database and so improves overall performance.

SELECT statement

Rows Before GIN

Rows After GIN

Improvement with GIN

A

8

2

75%

B

7

2

71%

C

15

4

73%

Making life easier for SQL developers

If we can know in advance there are certain fields within the JSON data that we are likely to want to filter or match on, we can simplify the use of the data and improved performance by using the ability of CockroachDB to store a computed column.

ALTER TABLE customers ADD COLUMN email string AS (customer_profile->>'email') STORED;
ALTER TABLE orders ADD COLUMN email string AS (order_details->>'customer_email') STORED;

Doing this allows us to directly access and filter data on this new column:

explain analyze 
select customer_profile-'billing_address' as billing_address
      ,order_details-'shipping_address' as shipping_address 
  from customers c
      ,orders o 
 where o.email = c.email 
   and c.email = 'lisakim@example.com';

Finally, this also allows us to performance tune as we would with any other database:

create index on customers(email);
create index on orders(email);

Conclusion:

JSONB is a powerful data type that allows developers to store semi-structured and nested data structures efficiently in CockroachDB. With its support for indexing, querying, and manipulating JSONB data using SQL commands, CockroachDB provides a flexible and efficient way to work with semi-structured data at scale, with the levels of availability required with modern critical workloads.