Codd Experiment 2: PostgreSQL Upgrade
Codd is a utility for managing PostgreSQL migrations, written in Haskell. In this blog entry, I experiment with how Codd works when PostgreSQL is upgraded.
Codd represents the state of a database at a given time using JSON,
either organized hierarchically by key using many files or in a single
JSON file. Migrations are the arrows/morphisms between these database
states. The state data is relatively low-level, and
Codd.Representations.Database
sub-modules implement
specifics for different versions of PostgreSQL. Does this mean that a
database will not verify after an upgrade?
Test Project
For this experiment, I use a temporary test project. It is similar to the test project that I used in the first Codd Experiment, and significant differences are noted below.
$ mkdir /tmp/codd-pg-upgrade
$ cd /tmp/codd-pg-upgrade
As in the first experiment, I use the static executable, available from the releases page.
$ mkdir bin
$ wget -O bin/codd \
https://github.com/mzabani/codd/releases/download/v0.1.1/codd
$ chmod 0755 bin/codd
As in the first experiment, I put the Codd-managed directories under
a var
directory. In this experiment, however, separate
directories are used each version of PostgreSQL.
$ mkdir -p var/codd-{migrations,schema}-{10,11}
In the first experiment, I managed all migrations under a
migrations
directory and linked to applied migrations that
Codd moved to the Codd-managed directory. In this experiment, I just use
the migrations
directory to store migrations that are
applied to each database, making copies before adding them so that Codd
does not (re)move the original migration files.
$ mkdir migrations
As in the first experiment, the config
directory stores
configuration.
$ mkdir config
In this experiment, however, the configuration script is parameterized by the PostgreSQL version.
#!/usr/bin/env bash
if [ $# -ne 1 ] ; then
echo "usage: $0 PG_VERSION" 1>&2
return 2
fi
export CODD_CONNECTION="postgres://postgres:password@127.0.0.1/postgres"
export CODD_EXPECTED_SCHEMA_DIR="var/codd-schema-$1"
export CODD_MIGRATION_DIRS="var/codd-migrations-$1"
export CODD_TXN_ISOLATION="serializable"
Test Server
For this experiment, I use postgres
containers to run test PostgreSQL servers of different versions. Codd
currently supports PostgreSQL versions 10 through 15. Note that images
for PostgreSQL 10 are no longer supported, but they are still available
for download.
Instead of upgrading the database, I simply start a new database and apply the same schema. Since only one PostgreSQL version is used at a time, I run all of them in the same way, just specifying different images.
PostgreSQL 10
I start with PostgreSQL 10, the oldest supported version.
The following command starts a test server using PostgreSQL 10.
$ docker run \
--detach \
--name "pg_codd" \
--publish "127.0.0.1:5432:5432" \
--env POSTGRES_DB="postgres" \
--env POSTGRES_USER="postgres" \
--env POSTGRES_PASSWORD="password" \
"postgres:10-alpine"
The following command configures the environment.
$ source config/codd-dev.sh 10
As described in the first Codd Experiment, a no-op migration is needed as a first migration.
-- codd: no-txn
SELECT 1;
The second migration initializes a schema. This is the test schema that I used in issue #133.
CREATE TABLE public.employee (
id SERIAL NOT NULL,
NOT NULL,
name TEXT CONSTRAINT employee_pkey PRIMARY KEY (id)
);
CREATE TABLE public.email (
INTEGER NOT NULL,
employee__id NOT NULL,
address TEXT CONSTRAINT email_pkey PRIMARY KEY (employee__id, address),
CONSTRAINT email_employee__id_fkey
FOREIGN KEY (employee__id)
REFERENCES public.employee(id)
);
CREATE UNIQUE INDEX email_lower_address_unique
ON public.email(lower(address));
The following commands add the migration. The migration files are copied so that the original files are not (re)moved.
$ cp migrations/* .
$ ./bin/codd add 0001-init-codd.sql
$ ./bin/codd add 0002-init-schema.sql
The database is consistent.
$ ./bin/codd verify-schema
[Info] Database and expected schemas match.
The PostgreSQL 10 database is no longer needed. The follow command stops and removes it, discarding the data.
$ docker stop pg_codd | xargs docker rm
PostgreSQL 11
The following command starts a test server using PostgreSQL 11.
$ docker run \
--detach \
--name "pg_codd" \
--publish "127.0.0.1:5432:5432" \
--env POSTGRES_DB="postgres" \
--env POSTGRES_USER="postgres" \
--env POSTGRES_PASSWORD="password" \
"postgres:11-alpine"
The following command configures the environment.
$ source config/codd-dev.sh 11
The same migrations are added. The migration files are copied so that the original files are not (re)moved.
$ cp migrations/* .
$ ./bin/codd add 0001-init-codd.sql
$ ./bin/codd add 0002-init-schema.sql
The database is consistent.
$ ./bin/codd verify-schema
[Info] Database and expected schemas match.
Does the PostgreSQL 11 database verify using the PostgreSQL 10 schema state?
$ env CODD_EXPECTED_SCHEMA_DIR="var/codd-schema-10" \
./bin/codd verify-schema \
| sed 's/^[^{]*//' \
| python -m json.tool
{
"schemas/public/tables/email/constraints/email_employee__id_fkey": [
"different-schemas",
{
"deferrable": false,
"deferred": false,
"definition": "FOREIGN KEY (employee__id) REFERENCES employee(id)",
"fk_deltype": "a",
"fk_matchtype": "s",
"fk_ref_table": "employee",
"fk_updtype": "a",
"inhcount": 0,
"local": true,
"noinherit": true,
"parent_constraint": null,
"supporting_index": "employee_pkey",
"type": "f",
"validated": true
}
],
"schemas/public/tables/email/constraints/email_pkey": [
"different-schemas",
{
"deferrable": false,
"deferred": false,
"definition": "PRIMARY KEY (employee__id, address)",
"fk_deltype": " ",
"fk_matchtype": " ",
"fk_ref_table": null,
"fk_updtype": " ",
"inhcount": 0,
"local": true,
"noinherit": true,
"parent_constraint": null,
"supporting_index": "email_pkey",
"type": "p",
"validated": true
}
],
"schemas/public/tables/employee/constraints/employee_pkey": [
"different-schemas",
{
"deferrable": false,
"deferred": false,
"definition": "PRIMARY KEY (id)",
"fk_deltype": " ",
"fk_matchtype": " ",
"fk_ref_table": null,
"fk_updtype": " ",
"inhcount": 0,
"local": true,
"noinherit": true,
"parent_constraint": null,
"supporting_index": "employee_pkey",
"type": "p",
"validated": true
}
]
}
There are three discrepancies!
How is employee_pkey
different? Here is the PostgreSQL
10 state:
$ cat var/codd-schema-10/schemas/public/tables/employee/constraints/employee_pkey \
| python -m json.tool
{
"deferrable": false,
"deferred": false,
"definition": "PRIMARY KEY (id)",
"fk_deltype": " ",
"fk_matchtype": " ",
"fk_ref_table": null,
"fk_updtype": " ",
"inhcount": 0,
"local": true,
"noinherit": true,
"supporting_index": "employee_pkey",
"type": "p",
"validated": true
}
The PostgreSQL 11 version includes a parent_constraint
property that does not exist in PostgreSQL 10. The other discrepancies
have the exact same cause. Indeed,
Codd.Representations.Database.Pg11
deals with this
property.
The PostgreSQL 11 database is no longer needed. The follow command stops and removes it, discarding the data.
$ docker stop pg_codd | xargs docker rm
I expected to test more versions, but doing so is not necessary.
Conclusion
The database indeed does not verify after a PostgreSQL upgrade. What can be done in this case?
One option is to write the upgraded state to a new directory. This can be done using a command like the following.
$ ./bin/codd write-schema --dest-folder var/codd-schema-updated
It is up to administrators to verify that the new database state only contains differences due to the PostgreSQL upgrade. This would be tedious in a large database, so tooling would be very helpful. Perhaps such tooling could be built into Codd. Without tooling, one could make copies of the two directories, format all of the JSON in the copies, and use a graphical diff tool to compare them. For example, the following commands can be used to review the differences using Meld.
$ mkdir tmp
$ cp -r var/codd-schema-10 var/codd-schema-11 tmp
$ find tmp -type f -exec python -m json.tool {} {}.json \; -delete
$ meld tmp/codd-schema-10 tmp/codd-schema-11
Not all environments are upgraded at the same time, so one must be careful to use the correct database state for each environment. The database state using the old version of PostgreSQL can be removed only after all environments have been upgraded.
Since database state is version-specific, specifying the version in
the name (like codd-schema-11
) might be a good idea in
general!