Using AWS DMS to Replicate Data from Aurora RDS to CockroachDB

Using AWS DMS to Replicate Data from Aurora RDS to CockroachDB

In this blog I will demonstrate the process of using AWS DMS to aid in a basic migration from Aurora PostgreSQL (RDS) to CockroachDB to simplify your migration needs.

Database migration is the process of transferring data from one database to another. There are many reasons why database migration might be necessary, including upgrading to a newer version of a database, changing database vendors, or moving to a cloud-based database.

Database migration is important because it allows organisations to take advantage of new features and capabilities, improve performance, reduce costs, and improve data security. For example, moving to a Highly Available Distributed SQL database such as CockroachDB can provide regional growth, survivability and assist in data compliance.

AWS DMS (Database Migration Service) is a cloud-based service that helps with database migration. It provides a simple and reliable way to migrate data from on-premises databases to AWS or from AWS Database services to other on-premise or cloud hosted databases, such as CockroachDB.

AWS DMS supports a wide range of databases, including Oracle, SQL Server, MySQL, PostgreSQL, and MongoDB, among others. It supports both one-time and continuous data replication.

Overall, AWS DMS can simplify and streamline database migration, reducing risk and improving efficiency.

Getting Started

As a Pre-Requisite for this I recommend configuring a serverless CockroachDB cluster as it's a quick and simple way to get started with CockroachDB, to get started with this we have great documentation to get you going. Once you have created a cluster, note down your user/password, the connection string and download the ca.crt for authenticating DMS later. (For DMS you will need to rename this to ca.pem)

I will however go through the steps needed to provision an Aurora instance with the required configuration for use with DMS.

We will also require a schema and some data to populate the Aurora Instance, I will provide a very simple example and some insert statements further down in the blog.

Setting up Aurora RDS in AWS.

First of all, we need a database to migrate and replicate from, for the purposes of this blog we will create a new Aurora PostgreSQL instance in the AWS console as well as an appropriate parameter group to attach to it allowing the use of CDC for the on-going replication to work within DMS. Let's get started!

  • From the AWS RDS Console, click "Parameter Groups'' and then "Create Parameter Group". You'll be presented with 4 options and a create button, fill this in as per the picture below and click create. This will create us a new parameter group with default values that we can modify to enable logical replication for CDC to work in Aurora RDS.

Screenshot-2023-04-20-at-11.03.14

  • Now that we have a fresh parameter group, navigate to the group in the console and click on it, from here we need to edit some of the default parameters. Set the rds.logical_replication parameter in your DB CLUSTER parameter group to 1. As part of applying this parameter, AWS DMS sets the wal_level, max_wal_senders, max_replication_slots, and max_connections parameters. (There are some other parameters to check, but generally these are already the default, for more information see the AWS Documentation.)

  • We can now go ahead and create our source database (Aurora PostgreSQL) navigate to the databases section of the Amazon RDS console and hit "Create Database" You will be presented with two creation methods, as we want to attach a none default parameter group, we need to ensure we use "Standard Create" We also want to use Aurora (PostgreSQL Compatible), fill in all the required fields selecting Dev/Test, entering an identifier (name), credential settings, size, VPC, security group etc as you would normally, but under "Additional Configuration" ensure that you change the DB Cluster Parameter Group to the one we created earlier.

Screenshot-2023-04-20-at-11.12.44

  • Once everything is populated, hit create, this will take a short while to provision, so from here we can move on to creating a replication instance in DMS (Database Migration Service)

Schema Creation & Data Insertion Into Aurora

Now that we have an Aurora Instance in AWS and hopefully (if you read the first section of the blog) a CockroachDB serverless instance, we need to create a schema and populate the database with some dummy data so that we have something to replicate to Cockroach.

  • First, connect to your AuroraDB instance via your favorite GUI tool or via the command line. (If you have connection timeout, be sure to check your security group is allowing your network to connect)

  • Once you're connected to your Aurora instance, enter the below to create the example Schema we're going to migrate, in this example we'll create a users table and an orders table.

CREATE TABLE users (
    user_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    first_name     VARCHAR,
    last_name      VARCHAR
  );

CREATE TABLE orders (
  order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES users(user_id),
  product_id INT
);

  • With the Schema created, we need to insert some data, to keep things simple we can insert the same thing multiple times.
insert into users (first_name, last_name) values ('Daniel', 'Sheldon');
  • Now we have some users, let's grab one of the UUIDs from them so we can attach a dummy order to it, this is just so we can demonstrate the support for migrating tables with Foreign Key references, we will just enter the UID from the select statement and a dummy product ID to generate an order id.
select * from users 

insert into orders (user_id, product_id) values ('4f09762f-dc1b-4179-9d43-5695e0e9b9ca', 1)
  • If you want to test this and join the tables, you can run the below to verify that user now has made an order and the product_id purchased.
Select a.*, o.product_id from users a
inner join orders o on o.user_id = a.user_id
where o.product_id = 1;
  • This will be enough to demonstrate our basic migration from Aurora to CockroachDB, however please note that if you are working with more complex Schemas with options that are not supported in CockroachDB, you'll need to convert the schema first and create that in Cockroach. However, for this example, the schema is compatible on both sides.

Configuring AWS DMS for Replication

With our Aurora Database now populated with some dummy data, we can start setting up the Database Migration Service to replicate our data to CockroachDB, In this example we'll also be ensuring that on-going replication is enabled so that new inserts/changes will be replicated to our CockroachDB instance.

To get started, navigate to the RDS console within AWS, we will need to create 4 different things here, a source endpoint, a destination endpoint, a replication instance and finally our data migration task. Let's start with the replication instance as this takes the longest to create, the purpose of the replication instance is to feed the data from the source to the destination.

  • From the AWS DMS console, click "Replication instances'' on the left hand side and then "Create replication instance", most of the options here can be left default, however we will need to give it a name and I'd also recommend keeping it in the same VPC for this demonstration, the default size is also perfect for this example. Hit "Create Replication Instance" once completed you should see the below in the console.

Screenshot-2023-04-21-at-09.20.03

Now that we have our replication instance, we can set up our endpoints, the source endpoint will be our Aurora RDS instance we created in previous steps and the destination will be our CockroachDB cluster. Be sure to check your Cockroach Instance whitelists the networking within AWS, for the purpose of the blog I've allowed traffic from anywhere.

  • Begin by clicking "Endpoints" on the left hand side of the AWS DMS console and then click "Create Endpoint", this source endpoint is made quite simple by being able to use the "Select RDS DB Instance'' option, you then need to select the instance created earlier and provide access information manually, these are the credentials we took note of earlier. (You can also enter the database name you want as the target) It should look something like the below:

Screenshot-2023-04-21-at-09.26.40

Once ready, hit "Create endpoint"

  • Now do the same again for the Target/Destination Endpoint, however this time you will need to populate most of the information yourself as well as provide the CA we downloaded earlier. Important Be sure to set the Target Engine as "PostgreSQL" as that's the wire protocol CockroachDB supports and also set SSL to Verify-Full, your settings should look like the below:

Screenshot-2023-04-21-at-09.28.57

Once ready, hit "Create endpoint" Once created you should see 2 endpoints defined in the console.

Screenshot-2023-04-21-at-09.29.23

Now we are ready to configure and start our Database Migration task and get that data replicated into CockroachDB!

  • From the AWS DMS console, select "Database migration tasks" on the left hand side and hit "Create task"
  • Give the task a name, choose our replication instance and database endpoints from the drop downlists.
  • Our Migration type is going to be "Migrate existing data and replicate ongoing changes"
  • For our Task Settings, we'll use the wizard and leave everything as default (As we're using a compatible schema both sides we'll let it drop tables on the target"
  • In the Table mappings section we will also use the wizard, however you must click "Add new selection rule" and populate the information for our schema and tables, we're going to use the public schema and migrate all tables in it. It should look like the below:

Screenshot-2023-04-21-at-09.34.45

  • Once you're ready, hit "Create Task" it will take a few minutes to get going, but once it's started it look like this.

Screenshot-2023-04-21-at-09.39.49

  • You can also verify it is working by clicking in to the task and looking at the table statistics, you should be able to see the number of rows that have been replicated.

Verifying Data Replication

Now that we've set up all of our instances, our targets and our data migration task, it's time to see the cool stuff! Did the data replicate over successful? Let's find out.

The simplest way of doing this is to open up your command line of choice and connect to the cockroach instance or check the Cloud Console of your instance. I will use the CLI to verify the replication has completed.

  • Show tables;
dan@dsheldon-dms-testing-jq6.gcp-europe-west2.cockroachlabs.cloud:26257/cockroach> show tables;
  schema_name |       table_name        | type  | owner | estimated_row_count | locality
--------------+-------------------------+-------+-------+---------------------+-----------
  public      | awsdms_apply_exceptions | table | dan   |                   0 | NULL
  public      | emails                  | table | dan   |                   3 | NULL
  public      | orders                  | table | dan   |                   1 | NULL
  public      | users                   | table | dan   |                2929 | NULL
(4 rows)
  • Select * from orders;
dan@dsheldon-dms-testing-jq6.gcp-europe-west2.cockroachlabs.cloud:26257/cockroach> select * from orders;
                order_id               |               user_id                | product_id
---------------------------------------+--------------------------------------+-------------
  439bdbd2-bb49-4464-8f73-8bca00a6270e | 4f09762f-dc1b-4179-9d43-5695e0e9b9ca |          1
(1 row)


Time: 14ms total (execution 2ms / network 12ms)

Conclusion

You should now have basic working knowledge of how to perform a simple database migration to CockroachDB using AWS DMS, this is a great way to significantly reduce your migration efforts from one instance to another,

Thank you for taking the time to read this blog, I hope you found it useful.