Skip to content

Agent Setup

The dbward agent is the only component that connects to your database. It polls the server for approved jobs, executes them, and returns results.

The agent uses outbound HTTP only. It periodically polls the server for dispatched jobs — no inbound ports or firewall rules required. This simplifies deployment behind NATs, VPNs, and corporate firewalls.

Each agent advertises its capabilities: databases, environments, and operations. The server uses these to route jobs to the correct agent. Only agents whose capabilities match a job will receive it during polling.

Multiple agents can connect to the same server simultaneously. When two agents poll and receive the same job, the claim endpoint resolves the race — only one agent wins the claim, the other receives a conflict response and moves on.

After claiming a job, the agent has lease_duration_secs to complete it. During execution, the agent sends periodic heartbeats to extend the lease. If the agent crashes or loses connectivity, the lease expires and the job is marked execution_lost.

On startup and periodically thereafter, the agent collects schema metadata (tables, columns, row estimates, foreign keys) from each configured database and sends it to the server. This powers risk scoring, SQL review, and max_estimated_rows checks.

For preview and approval context, the agent can execute EXPLAIN (read-only, no side effects) against the target database. The execution plan is attached to the request so approvers can assess impact before approving.

If a database connection is lost, the agent skips jobs targeting that database but continues serving other configured databases normally. It retries the failed connection each poll interval and resumes when connectivity is restored.

Full configuration reference with all options: Configuration

agent_id = "prod-agent-1" # Unique identifier (shown in audit logs)
poll_interval_ms = 1000 # How often to poll for jobs (default: 1000)
lease_duration_secs = 300 # Job lease timeout (default: 300)
drain_timeout_secs = 60 # Graceful shutdown timeout (default: 60)
max_concurrent_tasks = 2 # Parallel job execution (default: 2)
statement_timeout_secs = 30 # Default SQL statement timeout (default: 30)

Note: statement_timeout_secs is the agent-level default. If a server-side [[execution_policies]] is configured for the target database/environment, the policy’s statement_timeout_secs takes precedence.

Migration warning: This timeout also applies to migrations. The default 30 seconds is likely too short for DDL operations on large tables. Configure a longer timeout via [[execution_policies]] for environments where migrations run.

[server]
url = "https://dbward.internal:3000"
agent_token = "${DBWARD_AGENT_TOKEN}" # Agent token (created with --agent flag)

The agent token must be created with --subject-type agent:

Terminal window
dbward token create --subject prod-agent-1 --subject-type agent --role agent-default

Capabilities determine which jobs this agent can handle. Databases and environments are derived automatically from the [databases] section keys. Only operations is configurable:

# Optional: limit which operations this agent handles (default: all)
# operations = ["execute_select", "execute_dml", "migrate_up", "migrate_down", "migrate_status"]

For example, if the config has [databases.app.production] and [databases.app.staging], the agent advertises capabilities for database app, environments production and staging.

[databases.app.production]
url = "postgres://user:pass@localhost:5432/mydb"
# migrations_dir = "db/migrations" # Optional: override for this database
[databases.analytics.production]
url = "mysql://user:pass@analytics.internal:3306/warehouse"

Supported URL schemes:

  • postgres:// or postgresql:// — PostgreSQL
  • mysql:// — MySQL

Deploy multiple agents for different databases or environments:

┌──────────────┐
│ Server │
└──┬───────┬───┘
│ │
▼ ▼
Agent A Agent B
(prod) (staging + analytics)

Agent A (prod-agent.toml):

agent_id = "prod-agent"
[server]
url = "https://dbward.internal:3000"
agent_token = "${DBWARD_AGENT_TOKEN}"
[databases.app.production]
url = "postgres://...@prod-db:5432/app"

Agent B (staging-agent.toml):

agent_id = "staging-agent"
[server]
url = "https://dbward.internal:3000"
agent_token = "${DBWARD_AGENT_TOKEN}"
[databases.app.staging]
url = "postgres://...@staging-db:5432/app"
[databases.analytics.staging]
url = "mysql://...@analytics:3306/warehouse"

The server automatically routes jobs based on each agent’s [databases] keys.

1. Agent polls: POST /api/agent/poll (with capabilities)
2. Server returns a dispatched job (if any match)
3. Agent claims: POST /api/agent/jobs/{id}/claim
4. Agent executes the SQL against the target database
5. Agent returns result: POST /api/agent/jobs/{id}/result
6. Server relays result to the waiting client

The agent sends periodic heartbeats during execution to extend the lease. If the agent crashes, the lease expires and the job is marked execution_lost.

/etc/systemd/system/dbward-agent.service
[Unit]
Description=dbward agent
After=network.target
[Service]
Type=simple
User=dbward
Environment=DBWARD_AGENT_TOKEN=dbw_...
Environment=DATABASE_URL=postgres://...
ExecStart=/usr/local/bin/dbward agent --config /etc/dbward/dbward-agent.toml
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Terminal window
docker run -d \
--name dbward-agent \
--network db-network \
-e DBWARD_AGENT_TOKEN=dbw_... \
-e DATABASE_URL=postgres://user:pass@db:5432/app \
-v ./dbward-agent.toml:/etc/dbward/dbward-agent.toml:ro \
ghcr.io/dbward-dev/dbward-agent:latest \
--config /etc/dbward/dbward-agent.toml

On startup, the agent:

  1. Creates a liveness probe immediately (/tmp/dbward-agent-alive)
  2. Retries connecting to the server (fetch public key + initial poll)
  3. Retries connecting to each configured database
  4. Once all prerequisites pass, creates a readiness probe (/tmp/dbward-agent-ready)

If the server or database is temporarily unavailable, the agent retries with exponential backoff (1s → 2s → 4s → 8s → 15s cap) instead of exiting. This prevents CrashLoopBackOff in Kubernetes when services start simultaneously.

Hard errors cause immediate exit (no retry):

  • Authentication failures (wrong token, wrong DB password)
  • Invalid configuration (unsupported URL scheme, 4xx from server)

Transient errors are retried:

  • Connection refused, timeouts, DNS failures, 5xx responses

During normal operation:

  • If the server becomes unreachable, the agent stops accepting new jobs (readiness removed) but stays alive and retries
  • If a database connection is lost during job execution, the agent enters degraded mode: stops accepting jobs, attempts reconnection every poll interval, and resumes when connectivity is restored
  • SIGTERM/SIGINT triggers graceful shutdown at any phase (including during startup retries)

Server-side visibility: From the operator’s perspective, the server considers an agent offline if no poll is received for 60 seconds — regardless of the cause (process crash, network partition, or host failure). Query GET /api/agents (requires metrics.view permission) to check. See Server Health checks.

startup_retry_initial_ms = 1000 # Initial retry delay (default: 1000)
startup_retry_max_ms = 15000 # Max retry delay cap (default: 15000)
startup_max_wait_secs = 60 # default 60s, 0 = retry forever

On SIGTERM/SIGINT:

  1. Stops accepting new jobs
  2. Waits for in-flight jobs to complete (up to drain_timeout_secs)
  3. Exits cleanly
  • Least privilege: Create a dedicated database user for the agent with only the permissions needed (e.g., SELECT + DML, no DDL for production)
  • Network isolation: The agent only needs outbound access to the server and the database. No inbound ports required.
  • Token rotation: Rotate agent tokens periodically. Create a new token, update the agent config, restart, then revoke the old token.
  • Environment variables: Use ${ENV_VAR} in TOML to avoid hardcoding credentials.