Codd Experiment
Codd is a utility for managing PostgreSQL migrations, written in Haskell. There are some things that I want to do that Codd is not designed for, and this blog entry describes an experiment with implementing them. Please note that this is not the normal way to use Codd.
The following are my goals for this experiment:
- I want to use Codd in environments where the database, role, and permissions are configured externally. Codd should never create or configure these things; it should just manage whatever database is specified by the connection string.
- I want to use Codd in environments that use password authentication. While it is fine to include default credentials for local development environments in the repository, it is critical that credentials for other environments are not leaked.
- I want to manage multi-step migrations as well as (optional) migration inverses. Defining multiple steps and inverses as separate Codd migrations is not a problem. They should be well organized, however, and they should not be lost even if they are never used.
Test Server
For this test, I use a postgres
container to run a test PostgreSQL server. The following command starts
the test server in a container named pg_codd
. The
codd_experiment
database is created with owner
codd
, authenticated using a demonstration password that is
the reverse of the role name.
$ docker run \
--detach \
--name "pg_codd" \
--publish "127.0.0.1:5432:5432" \
--env POSTGRES_DB="codd_experiment" \
--env POSTGRES_USER="codd" \
--env POSTGRES_PASSWORD="ddoc" \
"postgres:15.2-alpine"
The following command confirms that the database is ready. It is executed within the container using a trusted connection, so no password is required.
$ echo "SELECT 1 AS ok;" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
ok
----
1
(1 row)
Test Project
For this test, I use a temporary test project.
$ mkdir /tmp/codd-experiment
$ cd /tmp/codd-experiment
A static executable is available from the releases page.
$ mkdir bin
$ wget -O bin/codd \
https://github.com/mzabani/codd/releases/download/v0.1.1/codd
$ chmod 0755 bin/codd
Codd manages two directories. One directory contains migrations that
have been applied, and the other contains the state of the database
after all of those migrations have been applied. In the Codd repository
and documentation, these directories are stored in the project
directory, named sql-migrations
and
expected-schema
respectively.
Since multi-step transactions and inverse transactions may not all be
applied, I need to use a separate directory to manage all of them. This
directory needs to be easily distinguished from the Codd-managed
directory. My solution is to put the managed directories under a
var
directory. (Directory /var
in UNIX filesystems is used to store “variable data files.” It is part
of the Filesystem
Hierarchy Standard. UNIX folks recognize var
as a
directory that is generally managed by software.) I prefix them with
codd-
to make it clear what software manages them, which
may not be obvious in a large project that uses lots of software.
$ mkdir -p var/{codd-migrations,codd-schema}
The migrations
directory has all of the migrations. It
is managed by developers, not Codd. When a migration is applied, Codd
moves it to the var/codd-migrations
directory. My idea is
to then create a link to the file from the migrations
directory. This allows developers to see all migrations from the
migrations
directory as well as easily see which ones have
been applied.
$ mkdir migrations
I plan to name migration files like
NNNN(.S)-DESCRIPTION(.inverse).sql
. NNNN
is a
zero-padded, four-digit index, starting from 1. S
is an
optional migration step number, starting from 1 for each index.
DESCRIPTION
is a concise description of the migration.
Extensions .inverse.sql
indicate that the migration is an
inverse of the migration without .inverse
in the name.
(Note that the term inverse is
borrowed from mathematics: a migration composed with its inverse is
congruent to the identity
function. In other words, a migration inverse restores the state of
the schema before the migration.)
The config
directory stores configuration.
$ mkdir config
File config/codd-dev.sh
defines the environment
variables Codd used for development environments (local environments on
developer computers). The connection string includes the development
password. (Note that the password for non-development environments must
never be stored in repository.) The Codd-managed directories are
configured. The default transaction isolation level is set to serializable
for this test.
export CODD_CONNECTION=postgres://codd:ddoc@127.0.0.1/codd_experiment
export CODD_EXPECTED_SCHEMA_DIR=var/codd-schema
export CODD_MIGRATION_DIRS=var/codd-migrations
export CODD_TXN_ISOLATION=serializable
This configuration is loaded into the current shell using the following command.
$ source config/codd-dev.sh
Codd Initialization
With normal Codd usage, the first migration creates the database, which requires a privileged user. This does not work in an environment where the database, role, and permissions are configured externally, so I need to find a different way to initialize databases.
When Codd is run for the first time on a given database, it creates
the codd_schema.sql_migrations
table that is used to track
which migrations have been applied to the database as well as
initializes the on-disk schema representation. After being sidetracked
by a number of bogus error messages, I finally discovered that a no-op
migration like the following does what I want. The no-txn
configuration is required, a custom connection screen is not
required, and at least one statement is required. The following
migration is stored in migrations/0001-init-codd.sql
.
-- codd: no-txn
SELECT 1;
The following commands apply the initial migration and create the
link in the migrations
directory. I am linking manually in
this experiment, but it could of course be done programmatically.
$ ./bin/codd add migrations/0001-init-codd.sql
$ ln -s ../var/codd-migrations/2023-03-23-22-25-05-0001-init-codd.sql \
migrations/0001-init-codd.sql
The following command confirms that Codd initialization is complete.
$ ./bin/codd verify-schema
[Info] Database and expected schemas match.
Schema Initialization
The next migration initializes the test database schema. This test
schema follows the example in the README, with some minor differences.
The schema follows Beam (and beam-automigrate
)
naming conventions, but the employee_name
column is
incorrectly prefixed with the table name, to be fixed in a later
migration. Note that the employee_name
is used in an
index.
CREATE SEQUENCE employee___id___seq;
CREATE TABLE employee (
id INTEGER NOT NULL DEFAULT nextval('employee___id___seq'),
NOT NULL,
employee_name TEXT CONSTRAINT employee_pkey PRIMARY KEY (id)
);
ALTER SEQUENCE employee___id___seq
BY employee.id;
OWNED
CREATE UNIQUE INDEX employee_lower_employee_name_unique
ON employee(lower(employee_name));
The following commands apply this migration and create the link in
the migrations
directory.
$ ./bin/codd add migrations/0002-init-schema.sql
$ ln -s ../var/codd-migrations/2023-03-23-22-25-31-0002-init-schema.sql \
migrations/0002-init-schema.sql
Test Data
The following command inserts a test row into the database.
$ echo "INSERT INTO employee (employee_name) VALUES ('tcard');" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
The test data can be checked using the following command.
$ echo "SELECT * FROM employee;" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
id | employee_name
----+---------------
1 | tcard
(1 row)
Multi-Step Migration
If software is using the employee_name
column, then
simply renaming the column to name
causes the software to
fail. Renaming therefore requires changes to the software in addition to
schema migration. There is more than one way to do this. In this test, I
experiment with a multi-step migration based on the blue-green-safe
example. Please note that production migrations should be well tested
before use. I am not properly testing these migrations, as they are for
experimental purposes only.
First, the schema is migrated to an intermediate state that is
compatible with both the old and new versions of the software. This
migration step, stored in
migrations/0003.1-rename-employee-name.sql
, adds the
name
column, copies the data from the
employee_name
column, and creates triggers to keep the two
columns in sync.
ALTER TABLE employee ADD COLUMN name TEXT;
UPDATE employee SET name = employee_name;
ALTER TABLE employee
ADD CONSTRAINT employee_name_rename_equal_check
CHECK (name IS NOT DISTINCT FROM employee_name);
CREATE FUNCTION
employee_name_rename_set_new()TRIGGER
RETURNS AS $$
BEGIN
NEW.name = NEW.employee_name;
RETURN NEW;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER employee_name_old_insert_trigger
BEFORE INSERT ON employee
FOR EACH ROW
WHEN (NEW.name IS NULL)
EXECUTE FUNCTION employee_name_rename_set_new();
CREATE TRIGGER employee_name_old_update_trigger
BEFORE UPDATE ON employee
FOR EACH ROW
WHEN (OLD.employee_name IS DISTINCT FROM NEW.employee_name)
EXECUTE FUNCTION employee_name_rename_set_new();
CREATE FUNCTION
employee_name_rename_set_old()TRIGGER
RETURNS AS $$
BEGIN
NEW.employee_name = NEW.name;
RETURN NEW;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER employee_name_new_insert_trigger
BEFORE INSERT ON employee
FOR EACH ROW
WHEN (NEW.employee_name IS NULL)
EXECUTE FUNCTION employee_name_rename_set_old();
CREATE TRIGGER employee_name_new_update_trigger
BEFORE UPDATE ON employee
FOR EACH ROW
WHEN (OLD.name IS DISTINCT FROM NEW.name)
EXECUTE FUNCTION employee_name_rename_set_old();
The software can then be upgraded to the new version. This may be a
phased/gradual rollout that allows testing of the new version in
production before rolling out to all instances. During this time, some
software still uses employee_name
while upgraded software
uses name
.
If there is an issue with the new version, the upgraded instances are
downgraded to the old version. The inverse of the first migration step,
stored in 0003.1-rename-employee-name.inverse.sql
, is
applied to restore the schema to the previous state.
DROP TRIGGER employee_name_new_insert_trigger ON employee;
DROP TRIGGER employee_name_new_update_trigger ON employee;
DROP FUNCTION employee_name_rename_set_old;
DROP TRIGGER employee_name_old_insert_trigger ON employee;
DROP TRIGGER employee_name_old_update_trigger ON employee;
DROP FUNCTION employee_name_rename_set_new;
ALTER TABLE employee
DROP CONSTRAINT employee_name_rename_equal_check,
DROP COLUMN name;
Otherwise, after all software has been upgraded, the schema is
migrated to the target state. All of the software uses
name
, so compatibility with employee_name
is
not required. This migration step is stored in
0003.2-rename-employee-name.sql
. Note that the
name
column is dropped and the employee_name
is then renamed to name
so that any references to the
column are maintained. In this test, the column is used in an index. The
index name has to be renamed separately, of course.
DROP TRIGGER employee_name_new_insert_trigger ON employee;
DROP TRIGGER employee_name_new_update_trigger ON employee;
DROP FUNCTION employee_name_rename_set_old;
DROP TRIGGER employee_name_old_insert_trigger ON employee;
DROP TRIGGER employee_name_old_update_trigger ON employee;
DROP FUNCTION employee_name_rename_set_new;
ALTER TABLE employee
DROP CONSTRAINT employee_name_rename_equal_check,
DROP COLUMN name;
ALTER TABLE employee
RENAME COLUMN employee_name TO name;
ALTER INDEX employee_lower_employee_name_unique
RENAME TO employee_lower_name_unique;
The inverse of the second migration step, stored in
0003.2-rename-employee-name.inverse.sql
, can be used to
migrate the schema back to the intermediate state, as reverting a
multi-step migration also requires multiple steps. It renames the column
and index, and then it does the same as the first migration step.
ALTER INDEX employee_lower_name_unique
RENAME TO employee_lower_employee_name_unique;
ALTER TABLE employee
RENAME COLUMN name to employee_name;
...
The following commands apply and link the first step of the migration.
$ ./bin/codd add migrations/0003.1-rename-employee-name.sql
$ ln -s \
../var/codd-migrations/2023-03-23-22-26-42-0003.1-rename-employee-name.sql \
migrations/0003.1-rename-employee-name.sql
In existing rows, the employee_name
data is indeed
copied to name
.
$ echo "SELECT * FROM employee;" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
id | employee_name | name
----+---------------+-------
1 | tcard | tcard
(1 row)
To test the triggers, the following commands add new rows using both columns and display the results.
$ echo "INSERT INTO employee (employee_name) VALUES ('alice');" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
$ echo "INSERT INTO employee (name) VALUES ('bob');" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
$ echo "SELECT * FROM employee;" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
id | employee_name | name
----+---------------+-------
1 | tcard | tcard
2 | alice | alice
3 | bob | bob
(3 rows)
For this test, I assume that the hypothetical software upgrade is successful. The following commands apply and link the second step of the migration.
$ ./bin/codd add migrations/0003.2-rename-employee-name.sql
$ ln -s \
../var/codd-migrations/2023-03-23-22-31-40-0003.2-rename-employee-name.sql \
migrations/0003.2-rename-employee-name.sql
The following command confirms the existing data.
$ echo "SELECT * FROM employee;" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
id | name
----+-------
1 | tcard
2 | alice
3 | bob
(3 rows)
The following commands add a new row using the new column and displays the results.
$ echo "INSERT INTO employee (name) VALUES ('carol');" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
$ echo "SELECT * FROM employee;" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
id | name
----+-------
1 | tcard
2 | alice
3 | bob
4 | carol
(4 rows)
The migration is complete. Note that the inverses are not used in
this case, but they are still available in the migrations
directory.
Different Database Initialization
I would now like to test initialization of a new database using Codd.
Just for the sake of the test, I would like to copy the data from the
current database to the new database. The following command dumps the
data from the current database to file data.sql
, excluding
codd_schema
.
$ docker exec pg_codd \
pg_dump \
--host 127.0.0.1 \
--port 5432 \
--username codd \
--no-password \
--data-only \
--exclude-schema=codd_schema \
--quote-all-identifiers \
codd_experiment \
> data.sql
The following commands stop the current database and run a new one.
$ docker stop pg_codd
$ docker rm pg_codd
$ docker run \
--detach \
--name "pg_codd" \
--publish "127.0.0.1:5432:5432" \
--env POSTGRES_DB="codd_experiment" \
--env POSTGRES_USER="codd" \
--env POSTGRES_PASSWORD="ddoc" \
"postgres:15.2-alpine"
The following command runs all of the applied migrations on the new database.
$ ./bin/codd up
[Info] Checking if database 'codd_experiment' is accessible with the configured connection string... (waiting up to 5sec)
[Info] Checking which SQL migrations have already been applied...
[Info] Parse-checking headers of all pending SQL Migrations...
[Info] Creating codd_schema...
[Info] Applying 2023-03-23-22-25-05-0001-init-codd.sql
[Info] BEGINning transaction
[Info] Applying 2023-03-23-22-25-31-0002-init-schema.sql
[Info] Applying 2023-03-23-22-26-42-0003.1-rename-employee-name.sql
[Info] Applying 2023-03-23-22-31-40-0003.2-rename-employee-name.sql
[Info] COMMITed transaction
[Info] Database and expected schemas match.
[Info] All migrations applied to codd_experiment successfully
Note that multiple migrations are run within the same transaction. Nice!
The following command restores the data.
$ docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd < data.sql
The following command verifies the schema.
$ ./bin/codd verify-schema
[Info] Database and expected schemas match.
Finally, the following command confirms the data.
$ echo "SELECT * FROM employee;" \
| docker exec -i pg_codd psql --host 127.0.0.1 codd_experiment codd
id | name
----+-------
1 | tcard
2 | alice
3 | bob
4 | carol
(4 rows)
Conclusion
All goals were achieved.
The test migrations do not create the database, role, or permissions.
Local development credentials can (optionally) be stored in the
repository (in config/codd-dev.sh
), and credentials for
other environments can be configured within those environments.
Multi-step migrations and inverse migrations are managed elegantly.
All migrations are in the migrations
directory, including
those that have not been applied (yet). In most environments, links are
displayed using a different color, so it is very easy to see which
migrations have been applied. In other environments, the ls
--indicator-style=file-type
option can be used to display
an ampersand (@
) suffix on migrations that have been
applied.
$ ls -1 --indicator-style=file-type migrations
0001-init-codd.sql@
0002-init-schema.sql@
0003.1-rename-employee-name.inverse.sql
0003.1-rename-employee-name.sql@
0003.2-rename-employee-name.inverse.sql
0003.2-rename-employee-name.sql@
This experiment was successful!