Skip to content

vt-c-safe-migrations

Execute database migrations safely with zero-downtime patterns. Covers dual-write strategies, column shadowing, backfills, and rollback validation for large tables.

Plugin: core-standards
Category: Operations
Command: /vt-c-safe-migrations


Safe Migrations Skill

Purpose: Execute database migrations without downtime, data loss, or locking issues. Critical for production systems with continuous deployment.

Why Safe Migrations Matter

Unsafe migrations can cause: - Downtime - Table locks block all queries - Data loss - Improper rollbacks lose data - Timeouts - Long-running migrations fail - Deployment failures - Incompatible code + schema


Core Principles

1. Backwards Compatible Changes First

Deploy code that works with BOTH old and new schema, then migrate.

Deploy 1: Code handles both schemas
Deploy 2: Run migration
Deploy 3: Remove old schema handling

2. Never Lock Large Tables

-- ❌ BAD - Locks table for entire ALTER
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- ✅ GOOD - Use concurrent operations (Postgres)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- This doesn't lock in Postgres for nullable columns

3. Small, Incremental Changes

❌ One big migration that does everything
✅ Multiple small migrations that can be rolled back

Safe Patterns by Operation

Adding a Column

Safe - no lock for nullable columns:

# Rails migration
class AddPhoneToUsers < ActiveRecord::Migration[7.0]
  def change
    add_column :users, :phone, :string
    # nullable, no default = no lock
  end
end

With default value (requires backfill):

# Step 1: Add column without default
class AddStatusToOrders < ActiveRecord::Migration[7.0]
  def change
    add_column :orders, :status, :string
  end
end

# Step 2: Backfill in batches
class BackfillOrderStatus < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def up
    Order.in_batches(of: 1000) do |batch|
      batch.where(status: nil).update_all(status: 'pending')
    end
  end
end

# Step 3: Add default for new records
class AddDefaultToOrderStatus < ActiveRecord::Migration[7.0]
  def change
    change_column_default :orders, :status, 'pending'
  end
end

# Step 4: Add NOT NULL constraint
class RequireOrderStatus < ActiveRecord::Migration[7.0]
  def change
    change_column_null :orders, :status, false
  end
end

Removing a Column

Never remove immediately - use three-phase approach:

# Phase 1: Stop writing to column (deploy code change)
# Phase 2: Remove column reads from code
# Phase 3: Drop column after confirming no usage

class RemoveDeprecatedField < ActiveRecord::Migration[7.0]
  def change
    safety_assured do
      remove_column :users, :deprecated_field
    end
  end
end

Renaming a Column

Use dual-write pattern:

# Step 1: Add new column
add_column :users, :full_name, :string

# Step 2: Deploy code that writes to BOTH columns
# Step 3: Backfill old data
User.in_batches.update_all('full_name = name')

# Step 4: Deploy code that reads from new column only
# Step 5: Deploy code that writes to new column only
# Step 6: Remove old column
remove_column :users, :name

Adding an Index

Always use CONCURRENTLY:

class AddIndexToOrders < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def change
    add_index :orders, :user_id, algorithm: :concurrently
  end
end

Changing Column Type

Create new column, migrate data, swap:

# Step 1: Add new column with new type
add_column :products, :price_cents, :bigint

# Step 2: Backfill
Product.in_batches.each do |batch|
  batch.update_all('price_cents = price * 100')
end

# Step 3: Deploy code using new column
# Step 4: Remove old column
remove_column :products, :price

Large Table Strategies

For tables with millions of rows:

Batch Updates

# ❌ BAD - Updates all at once
User.update_all(verified: true)

# ✅ GOOD - Batch updates
User.in_batches(of: 1000) do |batch|
  batch.update_all(verified: true)
  sleep(0.1) # Prevent overwhelming the database
end

Ghost/pt-online-schema-change

For MySQL, use tools that create shadow tables:

# Percona toolkit
pt-online-schema-change \
  --alter "ADD COLUMN phone VARCHAR(20)" \
  D=mydb,t=users \
  --execute

PostgreSQL-specific

-- Create new table with desired schema
CREATE TABLE users_new (LIKE users INCLUDING ALL);
ALTER TABLE users_new ADD COLUMN phone VARCHAR(20);

-- Copy data in batches
INSERT INTO users_new
SELECT *, NULL as phone FROM users
WHERE id > $last_id
LIMIT 10000;

-- Swap tables (brief lock)
BEGIN;
ALTER TABLE users RENAME TO users_old;
ALTER TABLE users_new RENAME TO users;
COMMIT;

Rollback Strategy

Always Have Rollback Plan

class AddFieldToUsers < ActiveRecord::Migration[7.0]
  def up
    add_column :users, :preferences, :jsonb, default: {}
  end

  def down
    remove_column :users, :preferences
  end
end

Test Rollback Before Deploying

# Run migration
rails db:migrate

# Test rollback
rails db:rollback STEP=1

# Run again
rails db:migrate

Irreversible Migrations

class DropLegacyTable < ActiveRecord::Migration[7.0]
  def up
    # Backup first!
    execute "CREATE TABLE legacy_backup AS SELECT * FROM legacy_table"
    drop_table :legacy_table
  end

  def down
    raise ActiveRecord::IrreversibleMigration
  end
end

Pre-Migration Checklist

## Migration Safety Checklist

### Before Writing
- [ ] Is this change backwards compatible?
- [ ] Can the old code work with new schema?
- [ ] Is table size < 1M rows? If not, use batching

### Before Deploying
- [ ] Tested in staging with production-like data
- [ ] Rollback tested and working
- [ ] Backup verified
- [ ] Deployment window identified (if needed)
- [ ] Monitoring in place

### During Migration
- [ ] Watch for lock waits
- [ ] Monitor query latency
- [ ] Check replication lag (if applicable)

### After Migration
- [ ] Verify data integrity
- [ ] Check application logs for errors
- [ ] Confirm old code still works (if dual-deploy)

Tools

strong_migrations (Rails)

# Gemfile
gem 'strong_migrations'

# Catches unsafe migrations
class AddIndexToUsers < ActiveRecord::Migration[7.0]
  def change
    add_index :users, :email  # ❌ Blocked! Use add_index with algorithm: :concurrently
  end
end

pg_repack (PostgreSQL)

# Rebuild table without locking
pg_repack -t users mydb

gh-ost (MySQL)

# Online schema change
gh-ost \
  --alter="ADD COLUMN phone VARCHAR(20)" \
  --database=mydb \
  --table=users \
  --execute

Common Mistakes

1. Adding NOT NULL without default

-- ❌ Fails if existing rows have NULL
ALTER TABLE users ADD COLUMN status VARCHAR(20) NOT NULL;

-- ✅ Add nullable first, backfill, then add constraint

2. Large index without CONCURRENTLY

-- ❌ Locks table during index creation
CREATE INDEX idx_users_email ON users(email);

-- ✅ Non-blocking index creation
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

3. No rollback tested

❌ Assume rollback works
✅ Actually run rollback in staging

4. Deploying code + migration together

❌ Code expects new column that doesn't exist yet
✅ Deploy migration first, then code (or backwards-compatible code)