Codd Experiment 3: Checkpoint Migrations
Codd is a utility for managing PostgreSQL migrations, written in Haskell. In the Codd Design Ideas blog entry, I describe my wish list of features, including checkpoint migrations. In this blog entry, I experiment with how checkpoint migrations can actually be managed with Codd.
A checkpoint is a migration that is equivalent to all previous migrations. I am particularly interested in creating a checkpoint to clean up the schema before release, because I happen to be working on such a checkpoint. I do not need support for later checkpoints any time soon, but note that they can be very useful. If migrations that add columns to tables result in poor alignment, then checkpoints can be used to improve database performance. Checkpoints cannot be applied to existing databases, but they can be used in new instances. They can also be used when upgrading PostgreSQL, as the database needs to be recreated in that case anyway.
Test Project
I am organizing this experiment like the one described in the Codd Experiment blog entry.
I manage all migrations in a migrations
directory and link
to applied migrations in the Codd-managed directory, stored under a
var
directory. See the blog entry for details.
$ mkdir /tmp/codd-checkpoint
$ cd /tmp/codd-checkpoint
$ mkdir -p bin config migrations var/codd-{migrations,schema}
$ wget -O bin/codd \
https://github.com/mzabani/codd/releases/download/v0.1.1/codd
$ chmod 0755 bin/codd
File config/codd-dev.sh
configures the development
environment.
export CODD_CONNECTION=postgres://postgres:password@127.0.0.1/postgres
export CODD_EXPECTED_SCHEMA_DIR=var/codd-schema
export CODD_MIGRATION_DIRS=var/codd-migrations
export CODD_TXN_ISOLATION=serializable
Test Servers
For this experiment, I use postgres
containers to run PostgreSQL servers. The following command runs a
server in a container named pg_codd
.
$ docker run \
--detach \
--name "pg_codd" \
--publish "127.0.0.1:5432:5432" \
--env POSTGRES_DB="postgres" \
--env POSTGRES_USER="postgres" \
--env POSTGRES_PASSWORD="password" \
"postgres:11-alpine"
Note that I simultaneously run another PostgreSQL server to test the checkpoint below, using a different container name and host port.
The following command loads the development environment configuration that corresponds to the above server.
$ source config/codd-dev.sh
Migrations
For a minimal test, I create three migrations.
Migration migrations/0001-init-codd.sql
is a special
no-op migration that is required to make Codd initialize the
codd_schema.sql_migrations
table correctly. See the Codd
Initialization for details.
-- codd: no-txn
SELECT 1;
Migration migrations/0002-add-employee.sql
adds an
employee
table.
CREATE TABLE employee (
id SERIAL NOT NULL,
NOT NULL,
name TEXT CONSTRAINT employee_pkey PRIMARY KEY (id)
);
Migration migrations/0003-add-email.sql
adds an
email
table.
CREATE TABLE email (
INTEGER NOT NULL,
employee__id NOT NULL,
address TEXT CONSTRAINT email_pkey PRIMARY KEY (employee__id, address),
CONSTRAINT email_employee__id_fkey
FOREIGN KEY (employee__id)
REFERENCES employee(id)
);
CREATE UNIQUE INDEX email_lower_address_unique
ON email(lower(address));
The following commands add and link these migrations. Note that I replace the Codd filename timestamps with ellipsis in the commands below. If you are following along, you will have different timestamps.
$ ./bin/codd add migrations/0001-init-codd.sql
$ ln -s \
../var/codd-migrations/...-0001-init-codd.sql \
migrations/0001-init-codd.sql
$ ./bin/codd add migrations/0002-add-employee.sql
$ ln -s \
../var/codd-migrations/...-0002-add-employee.sql \
migrations/0002-add-employee.sql
$ ./bin/codd add migrations/0003-add-email.sql
$ ln -s \
../var/codd-migrations/...-0003-add-email.sql \
migrations/0003-add-email.sql
The following command verifies that the database matches the expected schema.
$ ./bin/codd verify-schema
[Info] Database and expected schemas match.
Checkpoint Migration Verification
Migration migrations/0003-add-email.checkpoint.sql
implements a checkpoint migration.
CREATE TABLE employee (
id SERIAL NOT NULL,
NOT NULL,
name TEXT CONSTRAINT employee_pkey PRIMARY KEY (id)
);
CREATE TABLE email (
INTEGER NOT NULL,
employee__id NOT NULL,
address TEXT CONSTRAINT email_pkey PRIMARY KEY (employee__id, address),
CONSTRAINT email_employee__id_fkey
FOREIGN KEY (employee__id)
REFERENCES employee(id)
);
CREATE UNIQUE INDEX email_lower_address_unique
ON email(lower(address));
I verify this migration using a temporary server. The following command initializes the required directories.
$ mkdir -p var/codd-{migrations,schema}-checkpoint tmp
File config/codd-checkpoint.sh
configures the test
environment.
export CODD_CONNECTION=postgres://postgres:password@127.0.0.1:5433/postgres
export CODD_EXPECTED_SCHEMA_DIR=var/codd-schema-checkpoint
export CODD_MIGRATION_DIRS=var/codd-migrations-checkpoint
export CODD_TXN_ISOLATION=serializable
The following command runs the temporary server in a container named
pg_codd_checkpoint
. This server binds to host port 5433
because port 5432 is already used by the development server.
$ docker run \
--detach \
--name "pg_codd_checkpoint" \
--publish "127.0.0.1:5433:5432" \
--env POSTGRES_DB="postgres" \
--env POSTGRES_USER="postgres" \
--env POSTGRES_PASSWORD="password" \
"postgres:11-alpine"
The following command loads the test environment settings.
$ source config/codd-checkpoint.sh
With the test server configured, it is time to add the checkpoint
migration. Note that migrations/0001-init-codd.sql
is still
required as a separate migration, however, because of the way that Codd
initializes databases. The following commands adds the migrations. Note
that I add copies of the migrations because Codd moves them when they
are added. (FWIW, I quite dislike this behavior.) This is for
verification, so no links are created.
$ cp migrations/0001-init-codd.sql tmp/
$ ./bin/codd add tmp/0001-init-codd.sql
$ cp migrations/0003-add-email.checkpoint.sql tmp/
$ ./bin/codd add tmp/0003-add-email.checkpoint.sql
The following command verifies that the database matches the expected schema.
$ ./bin/codd verify-schema
[Info] Database and expected schemas match.
The following command verifies that the database matches the expected schema created using separate migrations. The checkpoint is correct!
$ env CODD_EXPECTED_SCHEMA_DIR=var/codd-schema \
./bin/codd verify-schema
[Info] Database and expected schemas match.
Note that this command correctly flags differences in column order.
When using a checkpoint for alignment, it would be useful to disregard
column order. I implemented this in a fork using an
--ignore-col-ord
option, and it works well. (I am not
linking to that implementation here because it is in a branch that will
eventually be removed.)
Migration Management
Codd maintains applied migrations on the filesystem, and the
codd_schema.sql_migrations
table in each database stores
the (timestamped) names of those migrations that have been applied to
that database. In addition, Codd maintains a “snapshot” of the database
state with all of those migrations applied, which can be used to confirm
that a database with all migrations applied is as expected. In this
test, I use var/codd-migrations
and
var/codd-schema
directories, respectively.
When a checkpoint migration is used, the applied migrations are clearly different than without the checkpoint. Also, the database state is different if the checkpoint is used for alignment. When managing more than one database, different databases may have different migrations and states at a given time.
When creating a checkpoint to clean up the schema before release, as
I am currently doing, one can simply change the applied migrations and
snapshot to match the checkpoint usage. All databases must then be
deleted and recreated. The history of individual migrations may be kept
in the migrations
directory, or they may be removed (and
would therefore only be accessible via Git).
The difference in applied migrations and state is an issue when creating a later checkpoint. One solution is to maintain separate directories during the time that there are differences. One must be careful to correctly configure Codd to use these directories according to the database being used. The old directories may be removed if/when all databases have been migrated to use the checkpoint.
I did a few quick tests to confirm the behavior. The database created using individual migrations of course matches the snapshot created using the checkpoint. The list of migrations in the database is irrelevant.
$ env CODD_CONNECTION=postgres://postgres:password@127.0.0.1/postgres \
./bin/codd verify-schema
[Info] Database and expected schemas match.
Commands like codd up
make use of the list of
migrations, however. It fails as expected.
$ env CODD_CONNECTION=postgres://postgres:password@127.0.0.1/postgres \
./bin/codd up
[Info] Checking if database 'postgres' is accessible with the configured connection string... (waiting up to 5sec)
[Info] Checking which SQL migrations have already been applied...
[Info] Parse-checking headers of all pending SQL Migrations...
[Info] Applying 2023-04-03-23-49-03-0001-init-codd.sql
[Info] BEGINning transaction
[Info] Applying 2023-04-03-23-49-14-0003-add-email.checkpoint.sql
[Error] Got SQL Error: SqlStatementException {sqlStatement = "CREATE TABLE employee (\n id SERIAL NOT NULL,\n name TEXT NOT NULL,\n CONSTRAINT employee_pkey PRIMARY KEY (id)\n);", psimpleError = SqlError {sqlState = "42P07", sqlExecStatus = FatalError, sqlErrorMsg = "relation \"employee\" already exists", sqlErrorDetail = "", sqlErrorHint = ""}}
[Warn] Waiting 1000ms before next try
...
Cleanup
The following commands stop the containers, remove the test project, and exit the shell that includes Codd configuration.
$ docker stop pg_codd_checkpoint | xargs docker rm
$ docker stop pg_codd | xargs docker rm
$ cd /tmp
$ rm -rf codd-checkpoint
$ exit
Conclusion
Codd can be used to manage checkpoint migrations. I am very happy to have Codd to verify them!
Working with checkpoint migrations currently requires many manual steps. Perhaps more ergonomic tooling can be added in the future.