Skip to main content

Codd Experiment

Codd is a utility for managing PostgreSQL migrations, written in Haskell. There are some things that I want to do that Codd is not designed for, and this blog entry describes an experiment with implementing them. Please note that this is not the normal way to use Codd.

The following are my goals for this experiment:

  • I want to use Codd in environments where the database, role, and permissions are configured externally. Codd should never create or configure these things; it should just manage whatever database is specified by the connection string.
  • I want to use Codd in environments that use password authentication. While it is fine to include default credentials for local development environments in the repository, it is critical that credentials for other environments are not leaked.
  • I want to manage multi-step migrations as well as (optional) migration inverses. Defining multiple steps and inverses as separate Codd migrations is not a problem. They should be well organized, however, and they should not be lost even if they are never used.

Test Server

For this test, I use a postgres container to run a test PostgreSQL server. The following command starts the test server in a container named pg_codd. The codd_experiment database is created with owner codd, authenticated using a demonstration password that is the reverse of the role name.

$ docker run \
    --detach \
    --name "pg_codd" \
    --publish "127.0.0.1:5432:5432" \
    --env POSTGRES_DB="codd_experiment" \
    --env POSTGRES_USER="codd" \
    --env POSTGRES_PASSWORD="ddoc" \
    "postgres:15.2-alpine"

The following command confirms that the database is ready. It is executed within the container using a trusted connection, so no password is required.

$ echo "SELECT 1 AS ok;" \
    | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
 ok
----
  1
(1 row)

Test Project

For this test, I use a temporary test project.

$ mkdir /tmp/codd-experiment
$ cd /tmp/codd-experiment

A static executable is available from the releases page.

$ mkdir bin
$ wget -O bin/codd \
    https://github.com/mzabani/codd/releases/download/v0.1.1/codd
$ chmod 0755 bin/codd

Codd manages two directories. One directory contains migrations that have been applied, and the other contains the state of the database after all of those migrations have been applied. In the Codd repository and documentation, these directories are stored in the project directory, named sql-migrations and expected-schema respectively.

Since multi-step transactions and inverse transactions may not all be applied, I need to use a separate directory to manage all of them. This directory needs to be easily distinguished from the Codd-managed directory. My solution is to put the managed directories under a var directory. (Directory /var in UNIX filesystems is used to store “variable data files.” It is part of the Filesystem Hierarchy Standard. UNIX folks recognize var as a directory that is generally managed by software.) I prefix them with codd- to make it clear what software manages them, which may not be obvious in a large project that uses lots of software.

$ mkdir -p var/{codd-migrations,codd-schema}

The migrations directory has all of the migrations. It is managed by developers, not Codd. When a migration is applied, Codd moves it to the var/codd-migrations directory. My idea is to then create a link to the file from the migrations directory. This allows developers to see all migrations from the migrations directory as well as easily see which ones have been applied.

$ mkdir migrations

I plan to name migration files like NNNN(.S)-DESCRIPTION(.inverse).sql. NNNN is a zero-padded, four-digit index, starting from 1. S is an optional migration step number, starting from 1 for each index. DESCRIPTION is a concise description of the migration. Extensions .inverse.sql indicate that the migration is an inverse of the migration without .inverse in the name. (Note that the term inverse is borrowed from mathematics: a migration composed with its inverse is congruent to the identity function. In other words, a migration inverse restores the state of the schema before the migration.)

The config directory stores configuration.

$ mkdir config

File config/codd-dev.sh defines the environment variables Codd used for development environments (local environments on developer computers). The connection string includes the development password. (Note that the password for non-development environments must never be stored in repository.) The Codd-managed directories are configured. The default transaction isolation level is set to serializable for this test.

export CODD_CONNECTION=postgres://codd:ddoc@127.0.0.1/codd_experiment
export CODD_EXPECTED_SCHEMA_DIR=var/codd-schema
export CODD_MIGRATION_DIRS=var/codd-migrations
export CODD_TXN_ISOLATION=serializable

This configuration is loaded into the current shell using the following command.

$ source config/codd-dev.sh

Codd Initialization

With normal Codd usage, the first migration creates the database, which requires a privileged user. This does not work in an environment where the database, role, and permissions are configured externally, so I need to find a different way to initialize databases.

When Codd is run for the first time on a given database, it creates the codd_schema.sql_migrations table that is used to track which migrations have been applied to the database as well as initializes the on-disk schema representation. After being sidetracked by a number of bogus error messages, I finally discovered that a no-op migration like the following does what I want. The no-txn configuration is required, a custom connection screen is not required, and at least one statement is required. The following migration is stored in migrations/0001-init-codd.sql.

-- codd: no-txn

SELECT 1;

The following commands apply the initial migration and create the link in the migrations directory. I am linking manually in this experiment, but it could of course be done programmatically.

$ ./bin/codd add migrations/0001-init-codd.sql
$ ln -s ../var/codd-migrations/2023-03-23-22-25-05-0001-init-codd.sql \
    migrations/0001-init-codd.sql

The following command confirms that Codd initialization is complete.

$ ./bin/codd verify-schema
[Info] Database and expected schemas match.

Schema Initialization

The next migration initializes the test database schema. This test schema follows the example in the README, with some minor differences. The schema follows Beam (and beam-automigrate) naming conventions, but the employee_name column is incorrectly prefixed with the table name, to be fixed in a later migration. Note that the employee_name is used in an index.

CREATE SEQUENCE employee___id___seq;

CREATE TABLE employee (
  id INTEGER NOT NULL DEFAULT nextval('employee___id___seq'),
  employee_name TEXT NOT NULL,
  CONSTRAINT employee_pkey PRIMARY KEY (id)
);

ALTER SEQUENCE employee___id___seq
  OWNED BY employee.id;

CREATE UNIQUE INDEX employee_lower_employee_name_unique
  ON employee(lower(employee_name));

The following commands apply this migration and create the link in the migrations directory.

$ ./bin/codd add migrations/0002-init-schema.sql
$ ln -s ../var/codd-migrations/2023-03-23-22-25-31-0002-init-schema.sql \
    migrations/0002-init-schema.sql

Test Data

The following command inserts a test row into the database.

$ echo "INSERT INTO employee (employee_name) VALUES ('tcard');" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd

The test data can be checked using the following command.

$ echo "SELECT * FROM employee;" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
 id | employee_name
----+---------------
  1 | tcard
(1 row)

Multi-Step Migration

If software is using the employee_name column, then simply renaming the column to name causes the software to fail. Renaming therefore requires changes to the software in addition to schema migration. There is more than one way to do this. In this test, I experiment with a multi-step migration based on the blue-green-safe example. Please note that production migrations should be well tested before use. I am not properly testing these migrations, as they are for experimental purposes only.

First, the schema is migrated to an intermediate state that is compatible with both the old and new versions of the software. This migration step, stored in migrations/0003.1-rename-employee-name.sql, adds the name column, copies the data from the employee_name column, and creates triggers to keep the two columns in sync.

ALTER TABLE employee ADD COLUMN name TEXT;

UPDATE employee SET name = employee_name;

ALTER TABLE employee
  ADD CONSTRAINT employee_name_rename_equal_check
    CHECK (name IS NOT DISTINCT FROM employee_name);

CREATE FUNCTION
  employee_name_rename_set_new()
  RETURNS TRIGGER
  AS $$
    BEGIN
      NEW.name = NEW.employee_name;
      RETURN NEW;
    END
  $$ LANGUAGE plpgsql;

CREATE TRIGGER employee_name_old_insert_trigger
  BEFORE INSERT ON employee
  FOR EACH ROW
  WHEN (NEW.name IS NULL)
  EXECUTE FUNCTION employee_name_rename_set_new();

CREATE TRIGGER employee_name_old_update_trigger
  BEFORE UPDATE ON employee
  FOR EACH ROW
  WHEN (OLD.employee_name IS DISTINCT FROM NEW.employee_name)
  EXECUTE FUNCTION employee_name_rename_set_new();

CREATE FUNCTION
  employee_name_rename_set_old()
  RETURNS TRIGGER
  AS $$
    BEGIN
      NEW.employee_name = NEW.name;
      RETURN NEW;
    END
  $$ LANGUAGE plpgsql;

CREATE TRIGGER employee_name_new_insert_trigger
  BEFORE INSERT ON employee
  FOR EACH ROW
  WHEN (NEW.employee_name IS NULL)
  EXECUTE FUNCTION employee_name_rename_set_old();

CREATE TRIGGER employee_name_new_update_trigger
  BEFORE UPDATE ON employee
  FOR EACH ROW
  WHEN (OLD.name IS DISTINCT FROM NEW.name)
  EXECUTE FUNCTION employee_name_rename_set_old();

The software can then be upgraded to the new version. This may be a phased/gradual rollout that allows testing of the new version in production before rolling out to all instances. During this time, some software still uses employee_name while upgraded software uses name.

If there is an issue with the new version, the upgraded instances are downgraded to the old version. The inverse of the first migration step, stored in 0003.1-rename-employee-name.inverse.sql, is applied to restore the schema to the previous state.

DROP TRIGGER employee_name_new_insert_trigger ON employee;
DROP TRIGGER employee_name_new_update_trigger ON employee;
DROP FUNCTION employee_name_rename_set_old;

DROP TRIGGER employee_name_old_insert_trigger ON employee;
DROP TRIGGER employee_name_old_update_trigger ON employee;
DROP FUNCTION employee_name_rename_set_new;

ALTER TABLE employee
  DROP CONSTRAINT employee_name_rename_equal_check,
  DROP COLUMN name;

Otherwise, after all software has been upgraded, the schema is migrated to the target state. All of the software uses name, so compatibility with employee_name is not required. This migration step is stored in 0003.2-rename-employee-name.sql. Note that the name column is dropped and the employee_name is then renamed to name so that any references to the column are maintained. In this test, the column is used in an index. The index name has to be renamed separately, of course.

DROP TRIGGER employee_name_new_insert_trigger ON employee;
DROP TRIGGER employee_name_new_update_trigger ON employee;
DROP FUNCTION employee_name_rename_set_old;

DROP TRIGGER employee_name_old_insert_trigger ON employee;
DROP TRIGGER employee_name_old_update_trigger ON employee;
DROP FUNCTION employee_name_rename_set_new;

ALTER TABLE employee
  DROP CONSTRAINT employee_name_rename_equal_check,
  DROP COLUMN name;
ALTER TABLE employee
  RENAME COLUMN employee_name TO name;

ALTER INDEX employee_lower_employee_name_unique
  RENAME TO employee_lower_name_unique;

The inverse of the second migration step, stored in 0003.2-rename-employee-name.inverse.sql, can be used to migrate the schema back to the intermediate state, as reverting a multi-step migration also requires multiple steps. It renames the column and index, and then it does the same as the first migration step.

ALTER INDEX employee_lower_name_unique
  RENAME TO employee_lower_employee_name_unique;

ALTER TABLE employee
  RENAME COLUMN name to employee_name;

...

The following commands apply and link the first step of the migration.

$ ./bin/codd add migrations/0003.1-rename-employee-name.sql
$ ln -s \
    ../var/codd-migrations/2023-03-23-22-26-42-0003.1-rename-employee-name.sql \
    migrations/0003.1-rename-employee-name.sql

In existing rows, the employee_name data is indeed copied to name.

$ echo "SELECT * FROM employee;" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
 id | employee_name | name
----+---------------+-------
  1 | tcard         | tcard
(1 row)

To test the triggers, the following commands add new rows using both columns and display the results.

$ echo "INSERT INTO employee (employee_name) VALUES ('alice');" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
$ echo "INSERT INTO employee (name) VALUES ('bob');" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
$ echo "SELECT * FROM employee;" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
 id | employee_name | name
----+---------------+-------
  1 | tcard         | tcard
  2 | alice         | alice
  3 | bob           | bob
(3 rows)

For this test, I assume that the hypothetical software upgrade is successful. The following commands apply and link the second step of the migration.

$ ./bin/codd add migrations/0003.2-rename-employee-name.sql
$ ln -s \
    ../var/codd-migrations/2023-03-23-22-31-40-0003.2-rename-employee-name.sql \
    migrations/0003.2-rename-employee-name.sql

The following command confirms the existing data.

$ echo "SELECT * FROM employee;" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
 id | name
----+-------
  1 | tcard
  2 | alice
  3 | bob
(3 rows)

The following commands add a new row using the new column and displays the results.

$ echo "INSERT INTO employee (name) VALUES ('carol');" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
$ echo "SELECT * FROM employee;" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
 id | name
----+-------
  1 | tcard
  2 | alice
  3 | bob
  4 | carol
(4 rows)

The migration is complete. Note that the inverses are not used in this case, but they are still available in the migrations directory.

Different Database Initialization

I would now like to test initialization of a new database using Codd.

Just for the sake of the test, I would like to copy the data from the current database to the new database. The following command dumps the data from the current database to file data.sql, excluding codd_schema.

$ docker exec pg_codd \
    pg_dump \
      --host 127.0.0.1 \
      --port 5432 \
      --username codd \
      --no-password \
      --data-only \
      --exclude-schema=codd_schema \
      --quote-all-identifiers \
      codd_experiment \
  > data.sql

The following commands stop the current database and run a new one.

$ docker stop pg_codd
$ docker rm pg_codd
$ docker run \
    --detach \
    --name "pg_codd" \
    --publish "127.0.0.1:5432:5432" \
    --env POSTGRES_DB="codd_experiment" \
    --env POSTGRES_USER="codd" \
    --env POSTGRES_PASSWORD="ddoc" \
    "postgres:15.2-alpine"

The following command runs all of the applied migrations on the new database.

$ ./bin/codd up
[Info] Checking if database 'codd_experiment' is accessible with the configured connection string... (waiting up to 5sec)
[Info] Checking which SQL migrations have already been applied...
[Info] Parse-checking headers of all pending SQL Migrations...
[Info] Creating codd_schema...
[Info] Applying 2023-03-23-22-25-05-0001-init-codd.sql
[Info] BEGINning transaction
[Info] Applying 2023-03-23-22-25-31-0002-init-schema.sql
[Info] Applying 2023-03-23-22-26-42-0003.1-rename-employee-name.sql
[Info] Applying 2023-03-23-22-31-40-0003.2-rename-employee-name.sql
[Info] COMMITed transaction
[Info] Database and expected schemas match.
[Info] All migrations applied to codd_experiment successfully

Note that multiple migrations are run within the same transaction. Nice!

The following command restores the data.

$ docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd < data.sql

The following command verifies the schema.

$ ./bin/codd verify-schema
[Info] Database and expected schemas match.

Finally, the following command confirms the data.

$ echo "SELECT * FROM employee;" \
  | docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
 id | name
----+-------
  1 | tcard
  2 | alice
  3 | bob
  4 | carol
(4 rows)

Conclusion

All goals were achieved.

The test migrations do not create the database, role, or permissions.

Local development credentials can (optionally) be stored in the repository (in config/codd-dev.sh), and credentials for other environments can be configured within those environments.

Multi-step migrations and inverse migrations are managed elegantly. All migrations are in the migrations directory, including those that have not been applied (yet). In most environments, links are displayed using a different color, so it is very easy to see which migrations have been applied. In other environments, the ls --indicator-style=file-type option can be used to display an ampersand (@) suffix on migrations that have been applied.

$ ls -1 --indicator-style=file-type migrations
0001-init-codd.sql@
0002-init-schema.sql@
0003.1-rename-employee-name.inverse.sql
0003.1-rename-employee-name.sql@
0003.2-rename-employee-name.inverse.sql
0003.2-rename-employee-name.sql@

This experiment was successful!