Troubleshooting
Deployment Troubleshooting
Section titled “Deployment Troubleshooting”Agent not picking up jobs
Section titled “Agent not picking up jobs”Symptoms: Requests stay in dispatched status; agent logs show no activity.
Causes:
-
Token mismatch — The agent token doesn’t match what the server expects.
Terminal window # Verify token is setecho $DBWARD_AGENT_TOKEN# Compare with server's agent-token file -
Capabilities mismatch — The agent’s
[databases.<name>.<env>]keys don’t match the request’sdatabase+environment.Terminal window # Check agent logs for registered capabilitiesgrep "capabilities" /var/log/dbward-agent.log -
Network — The agent cannot reach the server.
Terminal window curl http://server-host:3000/health
Fix: Ensure the agent token is correct, database/environment names match the server config, and the agent can reach the server on port 3000.
Lease expired / execution_lost
Section titled “Lease expired / execution_lost”Symptoms: Job shows execution_lost status after timeout.
Causes:
- Statement timeout too short — The query takes longer than
statement_timeout_secs(default: 30s). Especially common with DDL/migrations. - Agent crash — The agent died mid-execution. Check logs for panics or OOM kills.
- Network interruption — Heartbeats failed to reach the server.
Fix:
- Increase
statement_timeout_secsin agent config or use[[execution_policies]]on the server for specific environments. - For migrations, set a longer timeout via execution policies.
- Check container/process health and resource limits.
”no matching workflow” error
Section titled “”no matching workflow” error”Symptoms: Request immediately rejected with “no matching workflow”.
Cause: The database + environment combination in the request doesn’t match any [[workflows]] entry in the server config. dbward is fail-closed: if no workflow matches, the request is rejected.
Fix: Add a [[workflows]] section for the target database/environment pair:
[[workflows]]database = "app"environment = "staging"
[[workflows.steps]]type = "approval"[[workflows.steps.approvers]]role = "admin"min = 1SQLite locked
Section titled “SQLite locked”Symptoms: Server returns 500 errors with “database is locked” in logs.
Cause: Multiple server processes are writing to the same SQLite file. dbward-server must run as a single replica.
Fix:
- Ensure only one server instance uses the same
state_dir/ PVC. - On Kubernetes, use
strategy: Recreate(notRollingUpdate) for the server Deployment. - On ECS, set desired count to 1 for the server service.
OIDC validation errors
Section titled “OIDC validation errors”Symptoms: Login fails with “invalid token” or “issuer mismatch”.
Causes:
- Issuer mismatch — The
issuerin server config doesn’t match theissclaim in the JWT. - Audience mismatch — The
audiencein server config doesn’t match theaudclaim. - Clock skew — Server time is off by more than the allowed leeway.
Fix:
# Decode a token to check claimsecho $TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq .
# Verify issuer matches configgrep issuer /etc/dbward/server.tomlEnsure [auth.oidc] issuer and audience match your identity provider’s configuration exactly.