Codd Experiment 4: Migration Conflicts

Codd is a utility for managing PostgreSQL migrations, written in Haskell. Codd maintains applied transactions and a snapshot of the database state with all transactions applied. In this blog post, I experiment with migration conflicts. In a large project with many developers, migration conflicts happen quite often, particularly during early stages of development.

Test Project

I am organizing this experiment like the one described in the Codd Experiment blog entry. I manage all migrations in a migrations directory and link to applied migrations in the Codd-managed directory, stored under a var directory. See the blog entry for details.

$ mkdir /tmp/codd-conflict
$ cd /tmp/codd-conflict
$ mkdir -p bin config migrations var/codd-{migrations,schema}
$ wget -O bin/codd \
    https://github.com/mzabani/codd/releases/download/v0.1.1/codd
$ chmod 0755 bin/codd

This experiment simulates migration merge conflicts between two developers, Alice and Bob. File config/codd-alice.sh configures Alice’s development environment.

export CODD_CONNECTION=postgres://postgres:password@127.0.0.1/postgres
export CODD_EXPECTED_SCHEMA_DIR=var/codd-schema
export CODD_MIGRATION_DIRS=var/codd-migrations
export CODD_TXN_ISOLATION=serializable

File config/codd-bob.sh configures Bob’s development environment. The only difference is that it uses a different host port, required since I run the servers simultaneously on the same system.

export CODD_CONNECTION=postgres://postgres:password@127.0.0.1:5433/postgres
export CODD_EXPECTED_SCHEMA_DIR=var/codd-schema
export CODD_MIGRATION_DIRS=var/codd-migrations
export CODD_TXN_ISOLATION=serializable

Test Servers

For this experiment, I use postgres containers to run PostgreSQL servers. The following command runs Alice’s development server on host port 5432.

$ docker run \
    --detach \
    --name "pg_codd_alice" \
    --publish "127.0.0.1:5432:5432" \
    --env POSTGRES_DB="postgres" \
    --env POSTGRES_USER="postgres" \
    --env POSTGRES_PASSWORD="password" \
    "postgres:11-alpine"

The following command runs Bob’s development server on host port 5433.

$ docker run \
    --detach \
    --name "pg_codd_bob" \
    --publish "127.0.0.1:5433:5432" \
    --env POSTGRES_DB="postgres" \
    --env POSTGRES_USER="postgres" \
    --env POSTGRES_PASSWORD="password" \
    "postgres:11-alpine"

Project Initialization

Alice performs project initialization. The following command configures Alice’s environment.

$ source config/codd-alice.sh

This experiment uses Git. The following command initializes the repository.

$ git init

For simplicity, I initialize this test project directly in the main branch. The bin directory is just used to make this experiment easily repeatable, so I do not include it in the repository.

$ echo bin > .gitignore
$ git add .gitignore config
$ git commit -m 'Initial commit'

Migration migrations/0001-init-codd.sql is a special no-op migration that is required to make Codd initialize the codd_schema.sql_migrations table correctly. See the Codd Initialization section for details.

-- codd: no-txn

SELECT 1;

Alice adds and links this migrations. Note that I replace Codd filename timestamps with ellipsis in the commands below. If you are following along, you will have different timestamps.

$ ./bin/codd add migrations/0001-init-codd.sql
$ ln -s \
    ../var/codd-migrations/...-0001-init-codd.sql \
    migrations/0001-init-codd.sql

Previous migrations must never be modified. The following command makes this migration read-only.

$ chmod 0444 var/codd-migrations/*

The following commands create the initial commit, including this migration and the resulting snapshot.

$ git add migrations var
$ git commit -m 'Add migration 0001-init-codd'

Migration migrations/0002-add-employee.sql adds an employee table. Note that this migration is added in a separate Git commit so that Git retains a history of every snapshot. If this migration had been added in the above commit, then the snapshot representing the state of a database after the first migration has been applied would not be stored.

CREATE TABLE employee (
  id SERIAL NOT NULL,
  name TEXT NOT NULL,
  CONSTRAINT employee_pkey PRIMARY KEY (id)
);

Alice adds this migration, links it, and sets it read-only.

$ ./bin/codd add migrations/0002-add-employee.sql
$ ln -s \
    ../var/codd-migrations/...-0002-add-employee.sql \
    migrations/0002-add-employee.sql
$ chmod 0444 var/codd-migrations/*

The following commands create a commit that includes this migration and the resulting snapshot.

$ git add migrations var
$ git commit -m 'Add migration 0002-add-employee'

Merge Conflict

Codd represents the current schema (“snapshot”) in var/codd-schema using hierarchically organized JSON files. For example, the following shows this hierarchy for the current snapshot.

$ find var/codd-schema -type f | sort
var/codd-schema/db-settings
var/codd-schema/roles/postgres
var/codd-schema/schemas/public/objrep
var/codd-schema/schemas/public/sequences/employee_id_seq
var/codd-schema/schemas/public/tables/employee/cols/id
var/codd-schema/schemas/public/tables/employee/cols/name
var/codd-schema/schemas/public/tables/employee/constraints/employee_pkey
var/codd-schema/schemas/public/tables/employee/indexes/employee_pkey
var/codd-schema/schemas/public/tables/employee/objrep

Git merge conflicts only occur when there are multiple changes to the same JSON file. For example, changes to the same column results in a merge conflict, while changing different columns or adding different new columns would not result in conflicts.

I think that actual merge conflicts will be quite rare in practice, but I test one in this section anyway.

Alice (1)

Alice works on a new feature in a feature branch. The following command creates a branch named alice.

$ git checkout -b alice

The project requires a contact email for all employees. A migration cannot simply add a non-nullable column to a table that has existing rows, because existing rows would violate the NOT NULL constraint if they have no email value. There are various ways to deal with this.

This migration allows the email column to be null. Software can then ensure that any new rows include an email address, and any existing employees can be asked to register an email address. A later migration can change the column constraints after all rows have an email address.

For the sake of the example, the hypothetical software searches by email address. A index is added to support this case, and note that it is a partial index because the column is nullable.

Migration migrations/0003-add-employee-email.sql adds an email column to the employee table, as well as the partial index.

ALTER TABLE employee
  ADD COLUMN email TEXT;

CREATE INDEX employee_email_index
  ON employee(email)
  WHERE email IS NOT NULL;

Alice adds this migration, links it, and sets it read-only.

$ ./bin/codd add migrations/0003-add-employee-email.sql
$ ln -s \
    ../var/codd-migrations/...-0003-add-employee-email.sql \
    migrations/0003-add-employee-email.sql
$ chmod 0444 var/codd-migrations/*

The following commands create a commit (in the alice branch) that includes this migration and the resulting snapshot.

$ git add migrations var
$ git commit -m 'Add migration 0003-add-employee-email'

Alice then does further work on the feature. To add another commit, I represent this with a simple addition of a README.

$ echo "Alice was here!" > README.md
$ git add README.md
$ git commit -m 'Add README'

Bob (1)

Simultaneously, Bob works in a branch named bob, based on the main branch.

$ git checkout main
$ git checkout -b bob

The following commands load Bob’s configuration and synchronizes their development database. Note that the database state is that of the main branch, not the alice branch.

$ source config/codd-bob.sh
$ codd up

For the sake of the example, Bob also adds an email migration. Migration migrations/0003-add-employee-email.sql is similar to Alice’s migration, expect that it configures the index differently. Bob’s index forces contact email addresses to be unique, and uses lower so support case-insensitive searching. Note that it also uses migration index 3, as it is a migration from state 2 to state 3.

ALTER TABLE employee
  ADD COLUMN email TEXT;

CREATE UNIQUE INDEX employee_email_index
  ON employee(lower(email))
  WHERE email IS NOT NULL;

Note that I would normally name this index employee_lower_email_unique, but I do not use this name because I want a merge conflict.

Bob adds this migration, links it, and sets it read-only.

$ ./bin/codd add migrations/0003-add-employee-email.sql
$ ln -s \
    ../var/codd-migrations/...-0003-add-employee-email.sql \
    migrations/0003-add-employee-email.sql
$ chmod 0444 var/codd-migrations/*

The following commands create a commit (in the bob branch) that includes this migration and the resulting snapshot.

$ git add migrations var
$ git commit -m 'Add migration 0003-add-employee-email'

Bob then does further work on the feature. This work may include conflicts with Alice’s work, so I represent this with a simple addition of a README.

$ echo "Bob was here!" > README.md
$ git add README.md
$ git commit -m 'Add README'

Bob finishes development before Alice and merges to the main branch.

$ git checkout main
$ git merge bob

Alice (2)

Alice inspects the new commits to main and decides to incorporate them in the alice branch.

$ git checkout alice
$ source config/codd-alice.sh

Before doing anything, Alice’s version of the email migration is in the Codd migration directory, and there is a link to it in the migrations directory. The Codd snapshot directory contains the state of the database with this migration.

When rebasing on main, Git merges Bob’s migration and snapshot with Alice’s at a line level. Since the same index is configured differently, there are merge conflicts.

$ git rebase main

File migrations/0003-add-employee-email.sql has a merge conflict. It is a link to a migration file in the var/codd-migrations directory. The link from the main branch, merged from the bob branch, points to Bob’s migration, while the link from the alice brand points to Alice’s migration. The actual migration files in var/codd-migrations do not have merge conflicts because Codd timestamps them and they have different timestamps.

File var/codd-schema/schemas/public/tables/employee/indexes/employee_email_index also has a merge conflict. The JSON is minimized, so the full lines are marked as different. This is not very user friendly. (I created issue #137 about this.) The following commands show the differences. I change to the directory just so the command is not too long.

$ cd var/codd-schema/schemas/public/tables/employee/indexes/
$ diff \
    <(sed -n 2p employee_email_index | python -m json.tool) \
    <(sed -n 4p employee_email_index | python -m json.tool)
$ cd -

There are two differences. The definition is different, and the main version is unique while the alice version is not.

5c5
<     "definition": "CREATE UNIQUE INDEX employee_email_index ON public.employee USING btree (lower(email)) WHERE (email IS NOT NULL)",
---
>     "definition": "CREATE INDEX employee_email_index ON public.employee USING btree (email) WHERE (email IS NOT NULL)",
18c18
<     "unique": true
---
>     "unique": false

How can Alice resolve the conflicts? First, the duplicate migrations that are not actually conflicting need to be resolved. Alice and Bob discuss and decide that a unique constraint should not be used, but lower is a good idea. Alice needs to change the migration to the following.

ALTER TABLE employee
  ADD COLUMN email TEXT;

CREATE INDEX employee_email_index
  ON employee(lower(email))
  WHERE email IS NOT NULL;

This is a combinations of both migrations, and neither Alice’s nor Bob’s snapshot match the resulting state. Perhaps it is best to go back to the previous state and add the updated migration.

The following commands remove both versions of the migration.

$ rm migrations/0003-add-employee-email.sql
$ git rm var/codd-migrations/*-0003-add-employee-email.sql

Alice uses git log to find the commit hash of the commit that has the desired snapshot.

$ git log

The following commands restore the snapshot from that commit.

$ rm -rf var/codd-schema
$ git checkout ...hash... var/codd-schema

Alice’s development database needs to be reset as well.

$ docker stop pg_codd_alice | xargs docker rm
$ docker run \
    --detach \
    --name "pg_codd_alice" \
    --publish "127.0.0.1:5432:5432" \
    --env POSTGRES_DB="postgres" \
    --env POSTGRES_USER="postgres" \
    --env POSTGRES_PASSWORD="password" \
    "postgres:11-alpine"
$ ./bin/codd up

Alice now saves the updated migration to migrations/0003-add-employee-email.sql, adds it, links it, and sets it read-only.

$ ./bin/codd add migrations/0003-add-employee-email.sql
$ ln -s \
    ../var/codd-migrations/...-0003-add-employee-email.sql \
    migrations/0003-add-employee-email.sql
$ chmod 0444 var/codd-migrations/*

The updated files can now be added to the Git index.

$ git add \
    var/codd-migrations/...-0003-add-employee-email.sql \
    migrations/0003-add-employee-email.sql \
    var/codd-schema/schemas/public/tables/employee/indexes/employee_email_index

Alice continues the rebase.

$ git rebase --continue

The next commit, representing changes to the software also has a conflict.

<<<<<<< HEAD
Bob was here!
=======
Alice was here!
>>>>>>> 97de627 (Add README)

Alice resolves the conflict and finishes the rebase.

$ echo 'Alice and Bob were here!' > README.md
$ git add README.md
$ git rebase --continue

Alice finishes development and merges to the main branch.

$ git checkout main
$ git merge alice

Bob (2)

Bob, as well as any other developer that used the main branch after Bob merged the changes, must delete and recreate their development databases in order to use the updated migration.

$ source config/codd-bob.sh

Bob updates the bob branch.

$ git checkout bob
$ git rebase main

Bob resets the pg_codd_bob database.

$ docker stop pg_codd_bob | xargs docker rm
$ docker run \
    --detach \
    --name "pg_codd_bob" \
    --publish "127.0.0.1:5433:5432" \
    --env POSTGRES_DB="postgres" \
    --env POSTGRES_USER="postgres" \
    --env POSTGRES_PASSWORD="password" \
    "postgres:11-alpine"
$ ./bin/codd up

No Merge Conflict

It is pretty rare for developers to simultaneously change the same column, index, etc. It is much more common for developers to simultaneously work on disjoint migrations, though. In this case, Git does not register a merge conflict, but the developer still needs to resolve the migration conflict.

At a minimum, a migration index needs to be changed. This is a disadvantage to using migration indices. Why use them? Developers need to organize multi-step migrations and migration inverses. Multi-step migrations are not applied at the same time. (It they can be applied at the same time, then there is no need for a multi-step migration!) Migration inverses are created to handle upgrade problems, which may be completely unrelated to the database. They are usually not used, but it is very important to have them available. Migration indices provide an easy way to organize all migrations in the desired order.

Alice (3)

Alice is up first.

$ git checkout alice
$ source config/codd-alice.sh

Migration migrations/0004-add-employee-address.sql adds an address column to the employee table.

ALTER TABLE employee
  ADD COLUMN address TEXT;

Alice adds this migration, links it, and sets it read-only.

$ ./bin/codd add migrations/0004-add-employee-address.sql
$ ln -s \
    ../var/codd-migrations/...-0004-add-employee-address.sql \
    migrations/0004-add-employee-address.sql
$ chmod 0444 var/codd-migrations/*

The following commands create a commit (in the alice branch) that includes this migration and the resulting snapshot.

$ git add migrations var
$ git commit -m 'Add migration 0004-add-employee-address'

Bob (3)

Bob is up next.

$ git checkout bob
$ source config/codd-bob.sh

Migration migrations/0004-add-employee-phone.sql adds a phone column to the employee table. Note that it also uses migration index 4, as it is a migration from state 3 to state 4.

ALTER TABLE employee
  ADD COLUMN phone TEXT;

Bob adds this migration, links it, and sets it read-only.

$ ./bin/codd add migrations/0004-add-employee-phone.sql
$ ln -s \
    ../var/codd-migrations/...-0004-add-employee-phone.sql \
    migrations/0004-add-employee-phone.sql
$ chmod 0444 var/codd-migrations/*

The following commands create a commit (in the bob branch) that includes this migration and the resulting snapshot.

$ git add migrations var
$ git commit -m 'Add migration 0004-add-employee-phone'

Bob finishes development and merges to the main branch.

$ git checkout main
$ git merge bob

Alice (4)

Alice now needs to merge those changes into the alice branch.

$ git checkout alice
$ source config/codd-alice.sh

There are no conflicts, but just merging does not work. There are two issues:

A migration index needs to be changed. Bob’s migration was merged first, so Alice’s migration index should be changed from 4 to 5.
Both migrations add a column to the same table. Though the snapshots could be merged by Git without conflict, there is a database conflict: the new columns cannot both have the same column order number. Since Alice’s migration is second, the address column order number must be changed.

Alice needs to do an interactive rebase to edit the commit.

$ git rebase -i main

What is the best way to handle the snapshot change? This change is so simple that it could be done by hand, but that could be error-prone in more complex migrations. Setting up the main state and adding the updated migration is safer.

Alice makes a copy of the address migration using the new migration index, and then removes the old migration.

$ cp \
    var/codd-migrations/*-0004-add-employee-address.sql \
    migrations/0005-add-employee-address.sql
$ git rm var/codd-migrations/*-0004-add-employee-address.sql
$ git rm migrations/0004-add-employee-address.sql

Alice uses git log to find the commit hash of the main commit, which has the desired snapshot.

$ git log

The following commands restore the snapshot from that commit.

$ rm -rf var/codd-schema
$ git checkout ...hash... var/codd-schema

Alice’s development database needs to be reset as well.

$ docker stop pg_codd_alice | xargs docker rm
$ docker run \
    --detach \
    --name "pg_codd_alice" \
    --publish "127.0.0.1:5432:5432" \
    --env POSTGRES_DB="postgres" \
    --env POSTGRES_USER="postgres" \
    --env POSTGRES_PASSWORD="password" \
    "postgres:11-alpine"
$ ./bin/codd up

Alice now adds the updated migration, links it, and sets it read-only.

$ ./bin/codd add migrations/0005-add-employee-address.sql
$ ln -s \
    ../var/codd-migrations/...-0005-add-employee-address.sql \
    migrations/0005-add-employee-address.sql
$ chmod 0444 var/codd-migrations/*

Checking git status, the address column is indeed updated. The git diff command shows that the order value is incremented as expected. The updated files can now be added to the Git index.

$ git add migrations var

Alice commits these changes, updating the commit message to use the new migration index, and finishes the rebase.

$ git commit --amend
$ git rebase --continue

Alice merges to the main branch.

$ git checkout main
$ git merge alice

Bob (4)

In the merge conflict example, Bob needed to reset the pg_codd_bob database to use the updated migration. In this case, the updated migration was never applied to that database, so doing so is not necessary. Bob can use codd up to synchronize it.

$ git checkout bob
$ git rebase main
$ source config/codd-bob.sh
$ codd up

Cleanup

The following commands stop the containers, remove the test project, and exit the shell that includes Codd configuration.

$ docker stop pg_codd_alice pg_codd_bob | xargs docker rm
$ cd /tmp
$ rm -rf codd-conflict
$ exit

Conclusion

Resolving migration conflicts using Codd is a bit more complicated that I had anticipated, even with the trivial migrations that I used for testing. I am afraid that this will be error-prone, particularly for changes that Git does not register conflicts for.

My actual project uses Nix and Gargoyle, and the development databases are managed by the software that we are developing. This makes the above steps even more tricky than simply using Docker. My coworkers might not be very happy with this way of resolving migration conflicts.

Perhaps resolving migration conflicts could be simplified via tooling. Perhaps I should develop some scripts…

Author

Travis Cardwell

Published

April 6, 2023

Revised

April 6, 2023