Codd Experiment 3: Checkpoint Migrations

Codd is a utility for managing PostgreSQL migrations, written in Haskell. In the Codd Design Ideas blog entry, I describe my wish list of features, including checkpoint migrations. In this blog entry, I experiment with how checkpoint migrations can actually be managed with Codd.

A checkpoint is a migration that is equivalent to all previous migrations. I am particularly interested in creating a checkpoint to clean up the schema before release, because I happen to be working on such a checkpoint. I do not need support for later checkpoints any time soon, but note that they can be very useful. If migrations that add columns to tables result in poor alignment, then checkpoints can be used to improve database performance. Checkpoints cannot be applied to existing databases, but they can be used in new instances. They can also be used when upgrading PostgreSQL, as the database needs to be recreated in that case anyway.

Test Project

I am organizing this experiment like the one described in the Codd Experiment blog entry. I manage all migrations in a migrations directory and link to applied migrations in the Codd-managed directory, stored under a var directory. See the blog entry for details.

$ mkdir /tmp/codd-checkpoint
$ cd /tmp/codd-checkpoint
$ mkdir -p bin config migrations var/codd-{migrations,schema}
$ wget -O bin/codd \
    https://github.com/mzabani/codd/releases/download/v0.1.1/codd
$ chmod 0755 bin/codd

File config/codd-dev.sh configures the development environment.

export CODD_CONNECTION=postgres://postgres:password@127.0.0.1/postgres
export CODD_EXPECTED_SCHEMA_DIR=var/codd-schema
export CODD_MIGRATION_DIRS=var/codd-migrations
export CODD_TXN_ISOLATION=serializable

Test Servers

For this experiment, I use postgres containers to run PostgreSQL servers. The following command runs a server in a container named pg_codd.

$ docker run \
    --detach \
    --name "pg_codd" \
    --publish "127.0.0.1:5432:5432" \
    --env POSTGRES_DB="postgres" \
    --env POSTGRES_USER="postgres" \
    --env POSTGRES_PASSWORD="password" \
    "postgres:11-alpine"

Note that I simultaneously run another PostgreSQL server to test the checkpoint below, using a different container name and host port.

The following command loads the development environment configuration that corresponds to the above server.

$ source config/codd-dev.sh

Migrations

For a minimal test, I create three migrations.

Migration migrations/0001-init-codd.sql is a special no-op migration that is required to make Codd initialize the codd_schema.sql_migrations table correctly. See the Codd Initialization for details.

-- codd: no-txn

SELECT 1;

Migration migrations/0002-add-employee.sql adds an employee table.

CREATE TABLE employee (
  id SERIAL NOT NULL,
  name TEXT NOT NULL,
  CONSTRAINT employee_pkey PRIMARY KEY (id)
);

Migration migrations/0003-add-email.sql adds an email table.

CREATE TABLE email (
  employee__id INTEGER NOT NULL,
  address TEXT NOT NULL,
  CONSTRAINT email_pkey PRIMARY KEY (employee__id, address),
  CONSTRAINT email_employee__id_fkey
    FOREIGN KEY (employee__id)
      REFERENCES employee(id)
);

CREATE UNIQUE INDEX email_lower_address_unique
  ON email(lower(address));

The following commands add and link these migrations. Note that I replace the Codd filename timestamps with ellipsis in the commands below. If you are following along, you will have different timestamps.

$ ./bin/codd add migrations/0001-init-codd.sql
$ ln -s \
    ../var/codd-migrations/...-0001-init-codd.sql \
    migrations/0001-init-codd.sql
$ ./bin/codd add migrations/0002-add-employee.sql
$ ln -s \
    ../var/codd-migrations/...-0002-add-employee.sql \
    migrations/0002-add-employee.sql
$ ./bin/codd add migrations/0003-add-email.sql
$ ln -s \
    ../var/codd-migrations/...-0003-add-email.sql \
    migrations/0003-add-email.sql

The following command verifies that the database matches the expected schema.

$ ./bin/codd verify-schema
[Info] Database and expected schemas match.

Checkpoint Migration Verification

Migration migrations/0003-add-email.checkpoint.sql implements a checkpoint migration.

CREATE TABLE employee (
  id SERIAL NOT NULL,
  name TEXT NOT NULL,
  CONSTRAINT employee_pkey PRIMARY KEY (id)
);

CREATE TABLE email (
  employee__id INTEGER NOT NULL,
  address TEXT NOT NULL,
  CONSTRAINT email_pkey PRIMARY KEY (employee__id, address),
  CONSTRAINT email_employee__id_fkey
    FOREIGN KEY (employee__id)
      REFERENCES employee(id)
);

CREATE UNIQUE INDEX email_lower_address_unique
  ON email(lower(address));

I verify this migration using a temporary server. The following command initializes the required directories.

$ mkdir -p var/codd-{migrations,schema}-checkpoint tmp

File config/codd-checkpoint.sh configures the test environment.

export CODD_CONNECTION=postgres://postgres:password@127.0.0.1:5433/postgres
export CODD_EXPECTED_SCHEMA_DIR=var/codd-schema-checkpoint
export CODD_MIGRATION_DIRS=var/codd-migrations-checkpoint
export CODD_TXN_ISOLATION=serializable

The following command runs the temporary server in a container named pg_codd_checkpoint. This server binds to host port 5433 because port 5432 is already used by the development server.

$ docker run \
    --detach \
    --name "pg_codd_checkpoint" \
    --publish "127.0.0.1:5433:5432" \
    --env POSTGRES_DB="postgres" \
    --env POSTGRES_USER="postgres" \
    --env POSTGRES_PASSWORD="password" \
    "postgres:11-alpine"

The following command loads the test environment settings.

$ source config/codd-checkpoint.sh

With the test server configured, it is time to add the checkpoint migration. Note that migrations/0001-init-codd.sql is still required as a separate migration, however, because of the way that Codd initializes databases. The following commands adds the migrations. Note that I add copies of the migrations because Codd moves them when they are added. (FWIW, I quite dislike this behavior.) This is for verification, so no links are created.

$ cp migrations/0001-init-codd.sql tmp/
$ ./bin/codd add tmp/0001-init-codd.sql
$ cp migrations/0003-add-email.checkpoint.sql tmp/
$ ./bin/codd add tmp/0003-add-email.checkpoint.sql

The following command verifies that the database matches the expected schema.

$ ./bin/codd verify-schema
[Info] Database and expected schemas match.

The following command verifies that the database matches the expected schema created using separate migrations. The checkpoint is correct!

$ env CODD_EXPECTED_SCHEMA_DIR=var/codd-schema \
    ./bin/codd verify-schema
[Info] Database and expected schemas match.

Note that this command correctly flags differences in column order. When using a checkpoint for alignment, it would be useful to disregard column order. I implemented this in a fork using an --ignore-col-ord option, and it works well. (I am not linking to that implementation here because it is in a branch that will eventually be removed.)

Migration Management

Codd maintains applied migrations on the filesystem, and the codd_schema.sql_migrations table in each database stores the (timestamped) names of those migrations that have been applied to that database. In addition, Codd maintains a “snapshot” of the database state with all of those migrations applied, which can be used to confirm that a database with all migrations applied is as expected. In this test, I use var/codd-migrations and var/codd-schema directories, respectively.

When a checkpoint migration is used, the applied migrations are clearly different than without the checkpoint. Also, the database state is different if the checkpoint is used for alignment. When managing more than one database, different databases may have different migrations and states at a given time.

When creating a checkpoint to clean up the schema before release, as I am currently doing, one can simply change the applied migrations and snapshot to match the checkpoint usage. All databases must then be deleted and recreated. The history of individual migrations may be kept in the migrations directory, or they may be removed (and would therefore only be accessible via Git).

The difference in applied migrations and state is an issue when creating a later checkpoint. One solution is to maintain separate directories during the time that there are differences. One must be careful to correctly configure Codd to use these directories according to the database being used. The old directories may be removed if/when all databases have been migrated to use the checkpoint.

I did a few quick tests to confirm the behavior. The database created using individual migrations of course matches the snapshot created using the checkpoint. The list of migrations in the database is irrelevant.

$ env CODD_CONNECTION=postgres://postgres:password@127.0.0.1/postgres \
    ./bin/codd verify-schema
[Info] Database and expected schemas match.

Commands like codd up make use of the list of migrations, however. It fails as expected.

$ env CODD_CONNECTION=postgres://postgres:password@127.0.0.1/postgres \
    ./bin/codd up
[Info] Checking if database 'postgres' is accessible with the configured connection string... (waiting up to 5sec)
[Info] Checking which SQL migrations have already been applied...
[Info] Parse-checking headers of all pending SQL Migrations...
[Info] Applying 2023-04-03-23-49-03-0001-init-codd.sql
[Info] BEGINning transaction
[Info] Applying 2023-04-03-23-49-14-0003-add-email.checkpoint.sql
[Error] Got SQL Error: SqlStatementException {sqlStatement = "CREATE TABLE employee (\n  id SERIAL NOT NULL,\n  name TEXT NOT NULL,\n  CONSTRAINT employee_pkey PRIMARY KEY (id)\n);", psimpleError = SqlError {sqlState = "42P07", sqlExecStatus = FatalError, sqlErrorMsg = "relation \"employee\" already exists", sqlErrorDetail = "", sqlErrorHint = ""}}
[Warn] Waiting 1000ms before next try
...

Cleanup

The following commands stop the containers, remove the test project, and exit the shell that includes Codd configuration.

$ docker stop pg_codd_checkpoint | xargs docker rm
$ docker stop pg_codd | xargs docker rm
$ cd /tmp
$ rm -rf codd-checkpoint
$ exit

Conclusion

Codd can be used to manage checkpoint migrations. I am very happy to have Codd to verify them!

Working with checkpoint migrations currently requires many manual steps. Perhaps more ergonomic tooling can be added in the future.

Author

Travis Cardwell

Published

April 4, 2023