Migration Playbook for Agents
Agent-facing decision tree + recipes for making schema and permission changes in Mantle/DSQL projects. Use this when you need to "do a migration" and want a single page that tells you exactly which path to take.
For the full reference (DSQL constraints table, validation tooling details, troubleshooting), see Database Changes.
Decision Tree
Need a schema change?
├── Adding a column?
│ ├── Nullable? ─────────────────► S1: ADD COLUMN nullable (additive ALTER, metadata-only)
│ ├── NOT NULL with backfill? ──► S14: ADD nullable → UPDATE backfill → SET NOT NULL (recreation)
│ └── NOT NULL or DEFAULT? ─────► S2: ADD COLUMN ... NOT NULL/DEFAULT (recreation — automatic)
├── Dropping a column? ───────────► S5: DROP COLUMN (recreation — automatic)
├── Changing a column type? ──────► S6: ALTER COLUMN TYPE (recreation — automatic)
├── Renaming a column? ───────────► S7: drizzle-kit requires TTY → use --custom + manual ALTER
├── Adding an index? ─────────────► S3: CREATE INDEX (auto ASYNC)
├── Adding a unique constraint?
│ ├── Want onConflict-eligible? ─► S4a: uniqueIndex() → CREATE UNIQUE INDEX (preferred)
│ └── Want constraint semantics? ─► S4b: .unique() → ADD CONSTRAINT (recreation)
└── Tightening nullable → NOT NULL? ► S14: ADD nullable → backfill → SET NOT NULL (recreation)
Need a data migration?
└── Hand-write SQL + manual journal entry ► S9: hand-written recipe
After ANY schema change:
1. mantle generate permissions (if Lambda table access changed)
2. mantle check migrations (always — catches journal drift)
3. mantle db check-dsql (always — catches DSQL incompatibilities)
4. mantle deploy --stage staging (auto-builds; runs MigrateDSQL)
Deploy failed mid-batch?
└── S13: idempotent retry — fix the failing migration, redeploy
Recreation failed mid-step?
└── Recreation recovery recipe (below)
Squashing migration history?
└── Phase 7 consolidation recipe (below)The Universal Workflow
Every schema change follows the same 5 steps:
# 1. Edit schema
$EDITOR src/db/schema.ts
# 2. Generate migration SQL
npx mantle db generate # for adds/drops/renames Drizzle handles
npx mantle db generate --custom --name <desc> # agent-safe (always available)
# 3. Validate locally — no deploy required
npx mantle check migrations # journal ↔ SQL file consistency
npx mantle db check-dsql # DSQL compatibility classification per statement
npx mantle db migrate --dry-run # preview against mock schema (no mutations)
# 4. Regenerate permissions if Lambda table access changed
npx mantle generate permissions
# 5. Deploy (auto-builds; runs MigrateDSQL Lambda)
AWS_PROFILE=mantle-<Instance> npx mantle deploy --stage stagingAgent rule: Always use --custom from non-TTY environments. drizzle-kit's tablesResolver requires a real terminal for new-table and rename prompts; without one it fails fast with an actionable error.
Scenario Recipes
S1: Add a Nullable Column (additive ALTER)
The simplest path. DSQL accepts ALTER TABLE ... ADD COLUMN <name> <type> without constraints as a metadata-only operation.
// schema.ts
export const myTable = pgTable('my_table', {
// ...existing columns
notes: text('notes') // no .notNull(), no .default()
})-- migrations/NNNN_add_notes.sql
ALTER TABLE "my_table" ADD COLUMN "notes" text;check-dsql verdict: [1] OKRuntime: ~80ms; no data impact; existing rows get NULL for the new column.
S2: Add a NOT NULL or DEFAULT Column (automatic recreation)
DSQL rejects ALTER TABLE ... ADD COLUMN ... NOT NULL and ALTER TABLE ... ADD COLUMN ... DEFAULT N on populated tables. The classifier detects this and routes to the recreation pipeline automatically.
// schema.ts
export const myTable = pgTable('my_table', {
// ...existing columns
priority: integer('priority').notNull().default(0)
})-- migrations/NNNN_add_priority.sql
ALTER TABLE "my_table" ADD COLUMN "priority" integer DEFAULT 0 NOT NULL;check-dsql verdict: [1] RECREATION (NOT OK — if it shows OK on your classifier, you are on an old framework version pre-F4/F9 fix; upgrade @mantleframework/database) Runtime: ~1100ms (recreation: build temp table → INSERT data → DROP original → RENAME) Async indexes: All indexes on the original table are recreated as CREATE INDEX ASYNC on the new table. Physical index build runs in the background; sys.jobs tracks status.
Alternative — manual two-step (S14): If you need explicit control over backfill values per row (computed, joined from another table, etc.), use S14's pattern: ADD nullable → UPDATE backfill → SET NOT NULL. The automatic recreation path uses the literal DEFAULT for all existing rows.
S3: Add an Index (auto-ASYNC)
// schema.ts
export const myTable = pgTable('my_table', {/* ... */}, (table) => [
index('idx_my_table_counter').on(table.counter)
])-- migrations/NNNN_add_counter_index.sql
CREATE INDEX "idx_my_table_counter" ON "my_table" USING btree ("counter");check-dsql verdict: [1] INDEX (rewritten to CREATE INDEX ASYNC by sanitizer) Runtime: ~120ms to record the migration; physical index build is async (track via SELECT * FROM sys.jobs WHERE table_name = 'my_table') Gotcha (F10 fix): If your migration file has leading -- comments, the classifier strips them before pattern matching. The result is the same either way — but if you want [1] INDEX instead of [1] OK in the report, put the CREATE INDEX statement first.
S4: Add a Unique Constraint
Two paths depending on your onConflict requirements. In DSQL, both UNIQUE INDEX and UNIQUE CONSTRAINT are stored as UNIQUE INDEX entries in pg_indexes — there is no pg_constraint row for either. So onConflictDoUpdate works with either. The practical difference is at migration time:
S4a (preferred): uniqueIndex() — async path, no recreation
export const myTable = pgTable('my_table', {/* ... */}, (table) => [
uniqueIndex('uq_my_table_name_counter').on(table.name, table.counter)
])CREATE UNIQUE INDEX "uq_my_table_name_counter" ON "my_table" ("name", "counter");check-dsql verdict: [1] INDEX (auto-ASYNC) Runtime: ~120ms; index built in background.
S4b: .unique() — ADD CONSTRAINT, triggers recreation
export const myTable = pgTable('my_table', {/* ... */}, (table) => [
unique('uq_my_table_name').on(table.name)
])ALTER TABLE "my_table" ADD CONSTRAINT "uq_my_table_name" UNIQUE ("name");check-dsql verdict: [1] RECREATIONRuntime: longer (full table recreation)
Recommendation: Default to uniqueIndex(). Only use .unique() if you specifically need ADD CONSTRAINT semantics (e.g., to match a legacy schema or to express a CHECK constraint alongside).
S5: Drop a Column (automatic recreation)
// schema.ts — remove the column from the pgTable definition-- migrations/NNNN_drop_metadata.sql
ALTER TABLE "my_table" DROP COLUMN "metadata";check-dsql verdict: [1] RECREATIONRuntime: ~900ms; data in the dropped column is gone. Recreation details: Temp table built without the dropped column, data INSERTed from original (column omitted), original dropped, temp renamed.
S6: Change a Column Type (automatic recreation)
// schema.ts — change the column type
export const myTable = pgTable('my_table', {
// counter: integer → bigint
counter: bigint('counter', {mode: 'number'}).notNull().default(0)
})-- migrations/NNNN_widen_counter.sql
ALTER TABLE "my_table" ALTER COLUMN "counter" SET DATA TYPE bigint;check-dsql verdict: [1] RECREATIONRuntime: ~1000ms; data is cast from the old type to the new type during the INSERT step. Cast failures abort the migration. Constraint: The cast must be lossless or DSQL will reject it (e.g., text → integer will fail on any non-numeric row).
S7: Rename a Column (TTY trap — use --custom)
drizzle-kit's columnsResolver fires an interactive "is this a rename of X?" prompt for every removed+added column pair. This requires a TTY. From an agent or CI context, mantle db generate will fail fast — use the agent-safe workflow:
# 1. Generate an empty migration
npx mantle db generate --custom --name rename_counter_to_count
# 2. Hand-write the SQL (DSQL supports RENAME COLUMN as a metadata-only ALTER)-- migrations/NNNN_rename_counter_to_count.sql
ALTER TABLE "my_table" RENAME COLUMN "counter" TO "count";check-dsql verdict: [1] OKRuntime: ~80ms; metadata-only. Caveat: Also update schema.ts to match the new column name. Drizzle has no way to detect the rename without the TTY prompt, so the source-of-truth update is manual.
S8: Hand-Written SQL WITHOUT a Journal Entry (anti-pattern)
If you create a .sql file in migrations/ without adding a corresponding entry to meta/_journal.json, the migration will never run. mantle check migrations catches this:
$ npx mantle check migrations
ERROR: Orphaned SQL files (no journal entry): 0019_my_orphan.sqlFix: Either delete the SQL file or add the journal entry (see S9 recipe).
S9: Hand-Written SQL WITH Manual Journal (data migrations / multi-step DDL)
For data migrations (INSERT/UPDATE/DELETE on existing rows), multi-step DDL (e.g., backfill + SET NOT NULL), or conditional DDL (IF NOT EXISTS guards), hand-write the SQL and manually update the journal.
# 1. Generate empty migration file with descriptive name
npx mantle db generate --custom --name backfill_priority
# 2. Write SQL into migrations/NNNN_backfill_priority.sql-- migrations/0023_backfill_priority.sql
UPDATE "my_table" SET "priority" = 1 WHERE "is_active" = true;
UPDATE "my_table" SET "priority" = 0 WHERE "is_active" = false;# 3. Verify the journal entry was added (mantle db generate --custom does this for you)
npx mantle check migrationsIf you wrote the SQL manually without mantle db generate --custom, add the journal entry by hand:
{
"idx": 23,
"version": "7",
"when": 1747000000000,
"tag": "0023_backfill_priority",
"breakpoints": true
}The tag field must match the SQL filename without .sql. The idx is the next sequential integer. when is Date.now() at the time you create the file.
S10: Missing Permission Regeneration (after schema change)
If you add a new @RequiresTable annotation to a Lambda handler but forget to run mantle generate permissions, the new role + GRANT will not exist in permissions.sql. Deploy will succeed, but the Lambda will hit permission denied for table at runtime.
Detection: mantle generate permissions is idempotent. Always run it after schema or Lambda changes. If unsure, run it — it produces a diff-able file that shows you whether anything changed.
npx mantle generate permissions
git diff permissions/permissions.sql # shows changesConvention: Run mantle generate permissions BEFORE every mantle deploy. The deploy will not catch this gap (the validatePermissions warning fires only after deploy completes).
S11: GRANT Silent Success on Non-Existent Table (DSQL footgun)
DSQL silently succeeds on GRANT ... ON <nonexistent_table>. No error is raised. The grant is recorded but has no effect. This means:
- A partial migration failure leaves some tables uncreated
- The subsequent permissions step appears to apply ALL grants successfully
- Lambdas hit
permission deniedat runtime for the uncreated tables information_schema.role_table_grantsfilters by current-user visibility — agents querying it may get false negatives
Detection (F2 — important): Do NOT use information_schema.role_table_grants for permission verification from a non-owner connection. Use pg_class.relacl instead:
SELECT
c.relname AS table_name,
c.relacl AS acl
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'public'
AND c.relkind = 'r'
AND c.relname = '<your_table>';The relacl array shows ALL grants on the table, including those owned by other roles.
Safety net: MigrateDSQL runs validatePermissions after applyPermissions on every deploy. Any GRANT targeting a non-existent table is logged as a structured warning:
{
"level": "WARN",
"message": "DSQL permission validation found gaps (GRANTs targeting non-existent tables)",
"gapCount": 1,
"gaps": [{"table": "sessions", "role": "lambda_x", "operations": ["SELECT"]}]
}Action on warning:
- If the table should exist → fix the migration that failed to create it; redeploy
- If the table is intentionally deferred → no action; next deploy re-applies the GRANT once the table exists
- If the table was dropped → remove the corresponding GRANT from
permissions.sql
S12: Schema + Permission Change in Same Deploy
MigrateDSQL always runs migrations BEFORE permissions in the same invocation. So adding a table + the GRANTs for it in the same deploy works: the migration creates the table; the permissions step then GRANTs on the now-existing table. No special ordering required.
Order:
applyMigrationDsql()for each pending migrationapplyPermissions()(CREATE ROLE + GRANT + IAM mappings)applyPermissionsalso drops orphanlambda_*roles not inpermissions.sql(F13 fix)validatePermissions()(logs gaps as warnings)
If the migration fails partway, permissions is skipped for that invocation. Re-deploy after fixing the migration.
S13: Mid-Batch Failure Recovery (idempotent retry)
If a deploy fails mid-batch (e.g., 3 of 5 migrations succeeded before one failed), the migration runner has idempotent error handling for these PG codes:
| PG Code | Meaning |
|---|---|
| 42P07 | Relation already exists |
| 42P01 | Relation does not exist |
| 42701 | Column already exists |
| 42710 | Object (role, grant) already exists |
The next deploy will re-run the same batch. The already-applied migrations no-op via these codes. The failing migration runs again — fix the failure first, then redeploy.
Recreation pipeline exception: The needs_recreation path is NOT wrapped in idempotent catch. If a recreation fails mid-step (e.g., temp table created, INSERT succeeded, but DROP original failed), partial state must be resolved before retry. See "Recreation recovery" below.
S14: Nullable → Backfill → SET NOT NULL
For columns where you need NOT NULL semantics but want explicit control over the backfill value, use a 3-step migration (3 separate .sql files, 3 journal entries):
-- migrations/NNNN_add_priority_nullable.sql (S1 path — additive ALTER)
ALTER TABLE "my_table" ADD COLUMN "priority" integer;-- migrations/NNNN+1_backfill_priority.sql (S9 path — data migration)
UPDATE "my_table" SET "priority" = 1 WHERE "is_active" = true;
UPDATE "my_table" SET "priority" = 0 WHERE "is_active" = false;-- migrations/NNNN+2_priority_set_not_null.sql (recreation path)
ALTER TABLE "my_table" ALTER COLUMN "priority" SET NOT NULL;The final SET NOT NULL is classified as RECREATION (DSQL requires recreation for column constraint changes). DSQL will reject the SET NOT NULL if any row has NULL — that's the safety net that proves your backfill is complete.
check-dsql verdicts: [1] OK, [1] OK, [1] RECREATIONRuntime per step: ~80ms, depends on row count, ~1000ms
S15: Concurrent MigrateDSQL Invocations (OCC)
MigrateDSQL acquires a Postgres advisory lock at the start of each invocation. Concurrent invocations are serialized — the second invocation waits for the first to release the lock, then re-checks __drizzle_migrations and applies only what's still pending. Idempotent.
Caveat (F16, noted): The advisory lock covers migration application but NOT permission grants. If two MigrateDSQL invocations race past the migration step concurrently, one of them may see 9+ GRANT errors as the other holds in-flight grant locks. The errors are harmless — the next deploy re-applies the GRANTs — but they show up in the response payload.
If you see permissions.errors=[...] in a deploy result that otherwise looks healthy, check for concurrent invocations (e.g., a queued Terraform apply happening at the same time).
Recovery Procedures
Recreation recovery (mid-step failure)
If a recreation pipeline aborts mid-step, your DSQL state could be one of:
| State | Symptoms | Recovery |
|---|---|---|
| Temp table exists, data not copied | <table>_recreate_temp exists; original intact | DROP TABLE <table>_recreate_temp then redeploy |
| Temp table exists, data copied, original not dropped | Both <table> and <table>_recreate_temp exist; data duplicated | Manual decision: keep original (DROP temp) or keep temp (DROP original + RENAME temp) |
| Original dropped, temp not renamed | <table> does not exist; <table>_recreate_temp has data | ALTER TABLE <table>_recreate_temp RENAME TO <table> |
| Original dropped, temp does not exist | DATA LOSS — restore from backup | Restore from DSQL backup |
-- Connect with admin credentials to manually resolve:
SELECT tablename FROM pg_tables WHERE schemaname = 'public' AND tablename LIKE '%recreate_temp%';
-- Inspect both tables, decide, then DROP/RENAME accordingly.After resolving, run mantle check migrations and redeploy. The migration runner's idempotent path handles the rest.
Deploy deadlock recovery (legacy data source mechanism)
Only relevant if your migrate_dsql.tf was generated before the F8 fix (identifiable by a data "aws_lambda_invocation" "run_migration" block). The fix removed the data source — but if your instance hasn't been re-generated with the new framework, you can deadlock when a failed migration prevents the Lambda code from being updated.
# 1. Remove the failing migration from source
rm migrations/NNNN_failed.sql
# Edit migrations/meta/_journal.json to remove the entry
npx mantle check migrations # must show clean
# 2. Rebuild and force-update Lambda code directly (bypass TF)
rm -rf build && npx mantle build
cd build/lambdas/MigrateDSQL && zip -qr /tmp/recovery.zip . && cd ../../..
AWS_PROFILE=<profile> aws lambda update-function-code \
--function-name <stage>-MigrateDSQL \
--zip-file fileb:///tmp/recovery.zip
# 3. Manually invoke to verify clean run
AWS_PROFILE=<profile> aws lambda invoke \
--function-name <stage>-MigrateDSQL \
--payload '{"source":"manual-recovery"}' \
--cli-binary-format raw-in-base64-out /tmp/recovery-result.json
cat /tmp/recovery-result.json
# Expect: {"migrations":{"applied":0,"migrations":[]},"permissions":{"errors":[]}}
# 4. Regenerate infra (removes the legacy data source) and redeploy
npx mantle generate infra
AWS_PROFILE=<profile> npx mantle deploy --stage <stage>Orphan PG role recovery
If you see lambda_* roles in DSQL that are no longer in permissions.sql (e.g., from a removed Lambda):
The framework drops them automatically. applyPermissions (F13 fix) detects lambda_* roles not present in permissions.sql and issues DROP ROLE IF EXISTS for each, along with their sys.iam_pg_role_mappings entries. The result payload's permissions.droppedRoles field surfaces this for visibility.
If for some reason the auto-drop didn't run (older framework version), manual cleanup:
-- Connect as admin
DROP ROLE IF EXISTS lambda_old_handler_name;
-- IAM mapping is cleaned up automatically when the PG role is droppedOrphan __drizzle_migrations record (broken migration recorded as applied)
If a deploy reported success but DSQL doesn't reflect the schema change (rare — only happens with framework bugs like F11 pre-fix), __drizzle_migrations has a record claiming the migration ran. The next deploy will skip it.
Diagnosis:
SELECT tag, hash, created_at FROM drizzle.__drizzle_migrations ORDER BY id DESC LIMIT 5;Recovery (two paths):
- Re-apply manually: Connect as admin, run the SQL by hand. The
__drizzle_migrationsrecord stays the same. - Force re-run: Delete the record from
__drizzle_migrations, then redeploy. The migration runs again with the new framework code.
DELETE FROM drizzle.__drizzle_migrations WHERE tag = '<orphan_tag>';Permission Verification Recipes
Verify a Lambda's grants (avoid F2 false negative)
-- Use pg_class.relacl, NOT information_schema.role_table_grants
SELECT
c.relname AS table_name,
c.relacl AS acl
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'public'
AND c.relkind = 'r'
AND c.relname = '<your_table>';ACL letters: r=SELECT, w=UPDATE, a=INSERT, d=DELETE, D=TRUNCATE, x=REFERENCES, t=TRIGGER, *=GRANT OPTION
Verify IAM↔PG role mappings
SELECT pg_role, iam_arn FROM sys.iam_pg_role_mappings WHERE pg_role LIKE 'lambda_%';Verify the runner skipped the right migrations
SELECT id, tag, hash, created_at
FROM drizzle.__drizzle_migrations
ORDER BY id DESC
LIMIT 10;The hash should match sha256(migrations/<tag>.sql). If the file has been edited after being applied, the hash will mismatch — the runner will treat it as a new migration and try to apply it, which is usually wrong.
Squashing Migration History (Phase 7 Consolidation)
When historical migrations accumulate enough that the audit trail is more noise than signal, consolidate them into a single 0000_baseline.sql representing current schema state. Staging only.
# 1. Generate consolidated SQL via mantle db clone (Docker PG with all migrations replayed)
npx mantle db clone --stage staging
# 2. Export schema from the local clone
pg_dump --schema-only -h localhost -p 5433 -U postgres postgres > migrations/base/tables.sql
# 3. Hand-review tables.sql for DSQL-incompatible patterns (sequences, etc.)
# 4. Copy current permissions
cp permissions/permissions.sql migrations/base/permissions.sql
# 5. Archive old migrations
mkdir -p migrations/archive
mv migrations/[0-9]*.sql migrations/archive/
mv migrations/meta/_journal.json migrations/meta/_journal.pre-consolidation.json
# 6. Replace with baseline
mv migrations/base/tables.sql migrations/0000_baseline.sql
mv migrations/base/permissions.sql permissions/permissions.sql
# 7. Create new journal with single baseline entry
# (see migrations/meta/_journal.json template in the framework docs)
# 8. Mark baseline as already applied — see "Critical" section below
# 9. Deploy — should report 0 new migrations
AWS_PROFILE=<profile> npx mantle deploy --stage stagingCritical: Step 8 is the key gotcha. The runner tracks applied migrations by SHA-256 hash. If __drizzle_migrations still has the pre-consolidation history, the new baseline appears as a new migration — and applying it against an already-populated schema may fail or produce noisy idempotent skips. Use the framework CLI command to reconcile:
# Mark the baseline as already applied (preserves audit history — adds a row)
npx mantle db mark-applied --file migrations/0000_baseline.sql --endpoint <DSQL_ENDPOINT> --region <REGION>
# OR mark + truncate previous history (full consolidation — destructive, requires --yes)
npx mantle db mark-applied --file migrations/0000_baseline.sql --reset --yesIf the file's hash already exists in __drizzle_migrations, the command is a no-op (idempotent).
See Database Changes — Squashing Migration History for the full procedure.
Anti-Patterns (Things to NOT Do)
- Do NOT edit a migration SQL file after it has been applied. The hash mismatch will trip the runner and may attempt to re-apply against existing state.
- Do NOT use
information_schema.role_table_grantsfrom a non-owner connection to verify permissions. It filters out grants you don't own — usepg_class.relacl. - Do NOT skip
mantle generate permissionsbetween schema changes and deploys. ThevalidatePermissionssafety net catches gaps post-deploy but the warning is non-blocking; the right thing is to regenerate first. - Do NOT add a journal entry without a corresponding SQL file. The runner will fail at startup.
- Do NOT add a SQL file without a journal entry. The runner will not see it.
- Do NOT hand-edit
__drizzle_migrationsrows except for documented recovery procedures (orphan record purge, Phase 7 consolidation). - Do NOT run
mantle deploywithoutmantle buildfirst if you have outstanding source changes (the framework auto-builds since F6, but explicitmantle buildfollowed by agit statusclean check is the right hygiene). - Do NOT assume
ALTER TABLE ADD COLUMN ... DEFAULT Nis a metadata-only operation in DSQL. It is rejected on populated tables. The classifier routes it to recreation automatically — trust the recreation path.
Related Guides
- Database Changes — Full reference: DSQL constraints, validation tooling, troubleshooting
- Database — Schema definition, providers, CRUD operations
- Migration — Drizzle Kit workflow, DSQL runner, MigrateDSQL Lambda
- Entity Queries —
@RequiresTablepermission declarations - Deployment — Full deploy workflow