Skip to main content

Codd Experiment 2: PostgreSQL Upgrade

Codd is a utility for managing PostgreSQL migrations, written in Haskell. In this blog entry, I experiment with how Codd works when PostgreSQL is upgraded.

Codd represents the state of a database at a given time using JSON, either organized hierarchically by key using many files or in a single JSON file. Migrations are the arrows/morphisms between these database states. The state data is relatively low-level, and Codd.Representations.Database sub-modules implement specifics for different versions of PostgreSQL. Does this mean that a database will not verify after an upgrade?

Test Project

For this experiment, I use a temporary test project. It is similar to the test project that I used in the first Codd Experiment, and significant differences are noted below.

$ mkdir /tmp/codd-pg-upgrade
$ cd /tmp/codd-pg-upgrade

As in the first experiment, I use the static executable, available from the releases page.

$ mkdir bin
$ wget -O bin/codd \
    https://github.com/mzabani/codd/releases/download/v0.1.1/codd
$ chmod 0755 bin/codd

As in the first experiment, I put the Codd-managed directories under a var directory. In this experiment, however, separate directories are used each version of PostgreSQL.

$ mkdir -p var/codd-{migrations,schema}-{10,11}

In the first experiment, I managed all migrations under a migrations directory and linked to applied migrations that Codd moved to the Codd-managed directory. In this experiment, I just use the migrations directory to store migrations that are applied to each database, making copies before adding them so that Codd does not (re)move the original migration files.

$ mkdir migrations

As in the first experiment, the config directory stores configuration.

$ mkdir config

In this experiment, however, the configuration script is parameterized by the PostgreSQL version.

#!/usr/bin/env bash

if [ $# -ne 1 ] ; then
  echo "usage: $0 PG_VERSION" 1>&2
  return 2
fi

export CODD_CONNECTION="postgres://postgres:password@127.0.0.1/postgres"
export CODD_EXPECTED_SCHEMA_DIR="var/codd-schema-$1"
export CODD_MIGRATION_DIRS="var/codd-migrations-$1"
export CODD_TXN_ISOLATION="serializable"

Test Server

For this experiment, I use postgres containers to run test PostgreSQL servers of different versions. Codd currently supports PostgreSQL versions 10 through 15. Note that images for PostgreSQL 10 are no longer supported, but they are still available for download.

Instead of upgrading the database, I simply start a new database and apply the same schema. Since only one PostgreSQL version is used at a time, I run all of them in the same way, just specifying different images.

PostgreSQL 10

I start with PostgreSQL 10, the oldest supported version.

The following command starts a test server using PostgreSQL 10.

$ docker run \
    --detach \
    --name "pg_codd" \
    --publish "127.0.0.1:5432:5432" \
    --env POSTGRES_DB="postgres" \
    --env POSTGRES_USER="postgres" \
    --env POSTGRES_PASSWORD="password" \
    "postgres:10-alpine"

The following command configures the environment.

$ source config/codd-dev.sh 10

As described in the first Codd Experiment, a no-op migration is needed as a first migration.

-- codd: no-txn

SELECT 1;

The second migration initializes a schema. This is the test schema that I used in issue #133.

CREATE TABLE public.employee (
  id SERIAL NOT NULL,
  name TEXT NOT NULL,
  CONSTRAINT employee_pkey PRIMARY KEY (id)
);

CREATE TABLE public.email (
  employee__id INTEGER NOT NULL,
  address TEXT NOT NULL,
  CONSTRAINT email_pkey PRIMARY KEY (employee__id, address),
  CONSTRAINT email_employee__id_fkey
    FOREIGN KEY (employee__id)
      REFERENCES public.employee(id)
);

CREATE UNIQUE INDEX email_lower_address_unique
  ON public.email(lower(address));

The following commands add the migration. The migration files are copied so that the original files are not (re)moved.

$ cp migrations/* .
$ ./bin/codd add 0001-init-codd.sql
$ ./bin/codd add 0002-init-schema.sql

The database is consistent.

$ ./bin/codd verify-schema
[Info] Database and expected schemas match.

The PostgreSQL 10 database is no longer needed. The follow command stops and removes it, discarding the data.

$ docker stop pg_codd | xargs docker rm

PostgreSQL 11

The following command starts a test server using PostgreSQL 11.

$ docker run \
    --detach \
    --name "pg_codd" \
    --publish "127.0.0.1:5432:5432" \
    --env POSTGRES_DB="postgres" \
    --env POSTGRES_USER="postgres" \
    --env POSTGRES_PASSWORD="password" \
    "postgres:11-alpine"

The following command configures the environment.

$ source config/codd-dev.sh 11

The same migrations are added. The migration files are copied so that the original files are not (re)moved.

$ cp migrations/* .
$ ./bin/codd add 0001-init-codd.sql
$ ./bin/codd add 0002-init-schema.sql

The database is consistent.

$ ./bin/codd verify-schema
[Info] Database and expected schemas match.

Does the PostgreSQL 11 database verify using the PostgreSQL 10 schema state?

$ env CODD_EXPECTED_SCHEMA_DIR="var/codd-schema-10" \
    ./bin/codd verify-schema \
  | sed 's/^[^{]*//' \
  | python -m json.tool
{
    "schemas/public/tables/email/constraints/email_employee__id_fkey": [
        "different-schemas",
        {
            "deferrable": false,
            "deferred": false,
            "definition": "FOREIGN KEY (employee__id) REFERENCES employee(id)",
            "fk_deltype": "a",
            "fk_matchtype": "s",
            "fk_ref_table": "employee",
            "fk_updtype": "a",
            "inhcount": 0,
            "local": true,
            "noinherit": true,
            "parent_constraint": null,
            "supporting_index": "employee_pkey",
            "type": "f",
            "validated": true
        }
    ],
    "schemas/public/tables/email/constraints/email_pkey": [
        "different-schemas",
        {
            "deferrable": false,
            "deferred": false,
            "definition": "PRIMARY KEY (employee__id, address)",
            "fk_deltype": " ",
            "fk_matchtype": " ",
            "fk_ref_table": null,
            "fk_updtype": " ",
            "inhcount": 0,
            "local": true,
            "noinherit": true,
            "parent_constraint": null,
            "supporting_index": "email_pkey",
            "type": "p",
            "validated": true
        }
    ],
    "schemas/public/tables/employee/constraints/employee_pkey": [
        "different-schemas",
        {
            "deferrable": false,
            "deferred": false,
            "definition": "PRIMARY KEY (id)",
            "fk_deltype": " ",
            "fk_matchtype": " ",
            "fk_ref_table": null,
            "fk_updtype": " ",
            "inhcount": 0,
            "local": true,
            "noinherit": true,
            "parent_constraint": null,
            "supporting_index": "employee_pkey",
            "type": "p",
            "validated": true
        }
    ]
}

There are three discrepancies!

How is employee_pkey different? Here is the PostgreSQL 10 state:

$ cat var/codd-schema-10/schemas/public/tables/employee/constraints/employee_pkey \
  | python -m  json.tool
{
    "deferrable": false,
    "deferred": false,
    "definition": "PRIMARY KEY (id)",
    "fk_deltype": " ",
    "fk_matchtype": " ",
    "fk_ref_table": null,
    "fk_updtype": " ",
    "inhcount": 0,
    "local": true,
    "noinherit": true,
    "supporting_index": "employee_pkey",
    "type": "p",
    "validated": true
}

The PostgreSQL 11 version includes a parent_constraint property that does not exist in PostgreSQL 10. The other discrepancies have the exact same cause. Indeed, Codd.Representations.Database.Pg11 deals with this property.

The PostgreSQL 11 database is no longer needed. The follow command stops and removes it, discarding the data.

$ docker stop pg_codd | xargs docker rm

I expected to test more versions, but doing so is not necessary.

Conclusion

The database indeed does not verify after a PostgreSQL upgrade. What can be done in this case?

One option is to write the upgraded state to a new directory. This can be done using a command like the following.

$ ./bin/codd write-schema --dest-folder var/codd-schema-updated

It is up to administrators to verify that the new database state only contains differences due to the PostgreSQL upgrade. This would be tedious in a large database, so tooling would be very helpful. Perhaps such tooling could be built into Codd. Without tooling, one could make copies of the two directories, format all of the JSON in the copies, and use a graphical diff tool to compare them. For example, the following commands can be used to review the differences using Meld.

$ mkdir tmp
$ cp -r var/codd-schema-10 var/codd-schema-11 tmp
$ find tmp -type f -exec python -m json.tool {} {}.json \; -delete
$ meld tmp/codd-schema-10 tmp/codd-schema-11

Not all environments are upgraded at the same time, so one must be careful to use the correct database state for each environment. The database state using the old version of PostgreSQL can be removed only after all environments have been upgraded.

Since database state is version-specific, specifying the version in the name (like codd-schema-11) might be a good idea in general!