Skip to main content

Database Migration Lock - Quick Troubleshooting Guide

Quick Fix (When Tests Hang)

# Stop all database containers
docker-compose -f docker-compose.test.yml down -v

# Wait 5 seconds
sleep 5

# Start fresh
docker-compose -f docker-compose.test.yml up -d test-db

# Wait for database to be ready
sleep 15

# Run tests
npm test

Option 2: Manual Lock Release

If you don't want to restart the container:

# Release the lock directly
PGPASSWORD=testpassword psql -h 127.0.0.1 -p 5433 -U testuser -d testdb \
-c "DELETE FROM knex_migrations_lock;"

# Verify lock is released
PGPASSWORD=testpassword psql -h 127.0.0.1 -p 5433 -U testuser -d testdb \
-c "SELECT * FROM knex_migrations_lock;"

# Run tests
npm test

Option 3: Run Tests Sequentially

If locks keep occurring:

# Run tests without parallelization
npm test -- --threads false

# Or limit to single thread
npm test -- --threads 1

Understanding the Lock

What is knex_migrations_lock?

PostgreSQL table that prevents concurrent migrations:

-- This table exists in your database
SELECT * FROM knex_migrations_lock;

-- Output (when locked):
-- is_locked | locked_by
-- -----------+-----------
-- true | worker-1

Why It Happens

  1. Multiple test runners start simultaneously
  2. All try to run migrations at the same time
  3. First one acquires lock, others wait
  4. If first one crashes, lock stays held (stale lock)
  5. All subsequent tests hang waiting for lock

Timeline Example

10:00:00 - Test session 1 starts, acquires migration lock ✓
10:00:01 - Test session 2 starts, waits for lock...
10:00:02 - Test session 1 crashes without releasing lock
10:00:03 - Test session 2 still waiting... (will timeout after 10s)
10:00:13 - Test session 2 fails with "Migration table is already locked"

Prevention Strategies

1. Run Tests Serially (Most Reliable)

Edit backend/vitest.config.js:

export default defineConfig({
test: {
// Disable threading to prevent concurrent migration attempts
threads: false,

// Or limit to single worker
workers: 1,

globals: true,
environment: 'node',
setupFiles: './vitest.setup.js',
globalSetup: './global-setup.js',
testTimeout: 10000,
},
});

Trade-off: Tests run slower but never have lock conflicts

2. Increase Retry Timeout (Current Approach)

Edit backend/global-setup.js:

const MAX_RETRIES = 10;  // Increase from 5
const RETRY_DELAY_MS = 2000;

for (let i = 0; i < MAX_RETRIES; i++) {
try {
// Force free any stale locks
await dbInstance.migrate.forceFreeMigrationsLock();
await dbInstance.migrate.latest();
break;
} catch (error) {
if (i === MAX_RETRIES - 1) {
console.error('Failed after max retries');
throw error;
}
await new Promise(r => setTimeout(r, RETRY_DELAY_MS));
}
}

Total wait time: 10 retries × 2 seconds = 20 seconds max

Trade-off: Tests wait longer but usually succeed

3. Use Separate Test Databases

For parallel test workers:

# Instead of single testdb for all workers
# Create testdb_worker_0, testdb_worker_1, testdb_worker_2, etc.

Implementation: Use environment variable for worker ID:

// vitest.setup.js
const workerId = process.env.VITEST_WORKER_ID || '0';
process.env.DB_NAME = `testdb_worker_${workerId}`;

Trade-off: Complex setup, much faster parallel execution


CI/CD Configuration

GitHub Actions Example

name: Tests

on: [push, pull_request]

jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: testpassword
POSTGRES_DB: testdb
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5

steps:
- uses: actions/checkout@v3

- uses: actions/setup-node@v3
with:
node-version: '18'

- run: npm install

# Run tests serially in CI to avoid lock contention
- run: npm test -- --threads false
env:
DB_HOST: localhost
DB_PORT: 5432
DB_USER: postgres
DB_PASSWORD: testpassword
DB_NAME: testdb

Debugging

Check Current Lock Status

# Connect to test database
PGPASSWORD=testpassword psql -h 127.0.0.1 -p 5433 -U testuser -d testdb

# Inside psql:
-- See all processes
SELECT pid, usename, state, query FROM pg_stat_activity;

-- Find blocking locks
SELECT blocking_locks.pid, blocked_locks.pid
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
WHERE NOT blocked_locks.granted;

-- Check migration lock table
SELECT * FROM knex_migrations_lock;

Kill Hanging Processes

# From psql terminal
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'active' AND query LIKE '%migration%';

Production vs. Test Differences

AspectTestProduction
Parallel runsYes (multiple test workers)No (single server)
Frequent restartsYes (after each test suite)No (days/months)
Migration frequencyEvery test runOn deployment only
Lock timeout10 seconds (tests)60 seconds (deployments)
Recovery neededAutomatic on restartManual unlock

Long-Term Solution: Separate Migration Service

For production, consider separate deployment step:

# 1. Run migrations (once, before app starts)
npm run migrate:latest

# 2. Start N application instances (no migrations)
npm run start:server
npm run start:server
npm run start:server

This prevents ANY lock contention because migrations are pre-done.


References