Migrating to Cockroach - Part 1
[Version en Français ici]
Migrating from one database to another is never an easy task. Whatever your solution, switching to another tool requires preparation and adaptation. It is often a task that you will manage like a project.
To address this topic, I have decided to break the problem down into two parts.
In this first part, we will talk about moving the schema, the easier part.
Moving your schema
Data can be stored in a variety of formats depending on the database you use. I'll focus on the most well-known format here and talk about JSON documents and Relational Schemas.
Transforming JSON document data
Let's imagine you're storing hotels information like this:
We can see a recommendation sometimes used to add the type in the document. When moving to a SQL schema, this is typically not useful as this is the exact purpose of the schema!
We can also observe the kind of things that could make you decide to move to something different. The hotel description is pretty simple and can easily be transformed into a SQL format.
The issue is when we want to address the "reviews" attribute. This one is not a simple field but a relation between the hotel and the reviews. The issue with a NoSQL document DB is what it means to have thousands of reviews for a hotel? How do you manage the growth of the document? What the egress cost will be when moving a big JSON out of your cloud provider? What if you only want to show the last five reviews? You have different solutions to manage this, but often you'll have a Review document with an id composed of the hotel id and some sort of uniqueness or create a query to retrieve reviews dedicated to this particular hotel.
When moving to a relational database, any sub-document (which is a relation) is usually transformed into a table and a foreign key relationship with the main table. Pretty straightforward!
What is less straightforward is the "ratings" part. In this case, you can decide to once again create a Rating table with a link on a review ID. As a rating is directly associated with the review, you can also expand the review table you'll create with additional columns. The last option is to use a JSONB type to keep the ratings in JSON format and ease evolution. It's really up to you to decide what is best, depending on your use case, queries access patterns, etc.
To convert a JSON document into a table(s), you can find multiple tools that will help. But it's important to stay in control of the transformation and adjust it depending on your needs.
Once done, you can check the data types in CockroachDB to ensure a proper transformation.
Transforming relational schema
If you use an RDBMS, then you already have a schema. But it's not easier to move it as not all the databases on the market follow a standard.
Data types, Enumerations, etc. can be managed and defined in different flavors which are not compatible. In a distributed database like CockroachDB, it's a bad practice to use a sequence for uniqueness. Constraints on columns, foreign keys, etc. are not applied in the same way.
Let's explore a real world example:
Here we're creating a table for suppliers and another one for your contacts from a specific supplier. IDs are managed automatically by autoincrementing an INT.
As said earlier, it's not a best practice to use sequences for a Primary Key in CockroachDB. You can change the data type for the ID fields, but what happens with the Foreign Keys? It won't work!
This FK is perfectly valid when the ID is an INT as this is the same type in the two tables. But if you switch ID to UUID, then you also need to change the linked table definition. Your database can also rely on Stored Procedures. When not supported or in a different language, you'll have to adapt your code. But that's another topic! Long story short, whatever the system you're coming from and the destination, database migration is always something you need to prepare carefully.
Can Cockroach help me?
Identifying the required changes to be made is not an easy task. Fortunately, CockroachDB proposed a tool to help with your migration.
I've used it to demonstrate how it works with the example provided before for the suppliers' table. I ran it two times, one by updating keys to UUID, and another one by keeping keys in sequence.
I passed the whole schema creation script, not only the two tables used.
Hereafter is the result with keeping sequences:
All suggestions are about sequences:
Moving to UUID directly to evaluate the migration:
The result is different:
As discussed before, the error is due to different types. We asked to change the bosuppliers.id type to UUID, but the bosupplierpersons.supplierid is still an INT so far. What you decide to do is important for the next part, moving the data. Adapting the schema will require data transformations. Another option is to use a new column and an index. The idea is to use an invisible column that will store a unique ID and keep the original ID as an INT. The primary key will change:
In order to use this option, you can't rely on sequences as the original ID won't be incremented automatically anymore. This solution can be helpful when you'll import data by ensuring spread across the cluster. Then you need to work on your application to manage the change properly. It's not necessary to define the PK as not visible, it's just an option you have.
Conclusion
As helpful as MOLT can be, it only helps with migrating the schema by itself. It doesn't move the data or adapt the applications. It'll simplify the testing process. You can expect to have your schema ready in minutes and run a bunch of tests to ensure application queries are still valid, even without any data. If your queries return errors, then you'll need to check what's going on. The easiest part is now done. In the second part, I'll discuss moving your data into this new schema.