Agent Setup
Agent Setup
Section titled “Agent Setup”The dbward agent is the only component that connects to your database. It polls the server for approved jobs, executes them, and returns results.
How the Agent Works
Section titled “How the Agent Works”Polling architecture
Section titled “Polling architecture”The agent uses outbound HTTP only. It periodically polls the server for dispatched jobs — no inbound ports or firewall rules required. This simplifies deployment behind NATs, VPNs, and corporate firewalls.
Capabilities matching
Section titled “Capabilities matching”Each agent advertises its capabilities: databases, environments, and operations. The server uses these to route jobs to the correct agent. Only agents whose capabilities match a job will receive it during polling.
Multi-agent
Section titled “Multi-agent”Multiple agents can connect to the same server simultaneously. When two agents poll and receive the same job, the claim endpoint resolves the race — only one agent wins the claim, the other receives a conflict response and moves on.
Lease and heartbeat
Section titled “Lease and heartbeat”After claiming a job, the agent has lease_duration_secs to complete it. During execution, the agent sends periodic heartbeats to extend the lease. If the agent crashes or loses connectivity, the lease expires and the job is marked execution_lost.
Schema sync
Section titled “Schema sync”On startup and periodically thereafter, the agent collects schema metadata (tables, columns, row estimates, foreign keys) from each configured database and sends it to the server. This powers risk scoring, SQL review, and max_estimated_rows checks.
Dry-run EXPLAIN
Section titled “Dry-run EXPLAIN”For preview and approval context, the agent can execute EXPLAIN (read-only, no side effects) against the target database. The execution plan is attached to the request so approvers can assess impact before approving.
Degraded mode
Section titled “Degraded mode”If a database connection is lost, the agent skips jobs targeting that database but continues serving other configured databases normally. It retries the failed connection each poll interval and resumes when connectivity is restored.
Configuration reference
Section titled “Configuration reference”Full configuration reference with all options: Configuration
Agent identity
Section titled “Agent identity”agent_id = "prod-agent-1" # Unique identifier (shown in audit logs)poll_interval_ms = 1000 # How often to poll for jobs (default: 1000)lease_duration_secs = 300 # Job lease timeout (default: 300)drain_timeout_secs = 60 # Graceful shutdown timeout (default: 60)max_concurrent_tasks = 2 # Parallel job execution (default: 2)statement_timeout_secs = 30 # Default SQL statement timeout (default: 30)Note:
statement_timeout_secsis the agent-level default. If a server-side[[execution_policies]]is configured for the target database/environment, the policy’sstatement_timeout_secstakes precedence.
Migration warning: This timeout also applies to migrations. The default 30 seconds is likely too short for DDL operations on large tables. Configure a longer timeout via
[[execution_policies]]for environments where migrations run.
Server connection
Section titled “Server connection”[server]url = "https://dbward.internal:3000"agent_token = "${DBWARD_AGENT_TOKEN}" # Agent token (created with --agent flag)The agent token must be created with --subject-type agent:
dbward token create --subject prod-agent-1 --subject-type agent --role agent-defaultCapabilities
Section titled “Capabilities”Capabilities determine which jobs this agent can handle. Databases and environments are derived automatically from the [databases] section keys. Only operations is configurable:
# Optional: limit which operations this agent handles (default: all)# operations = ["execute_select", "execute_dml", "migrate_up", "migrate_down", "migrate_status"]For example, if the config has [databases.app.production] and [databases.app.staging], the agent advertises capabilities for database app, environments production and staging.
Database connections
Section titled “Database connections”[databases.app.production]url = "postgres://user:pass@localhost:5432/mydb"# migrations_dir = "db/migrations" # Optional: override for this database
[databases.analytics.production]url = "mysql://user:pass@analytics.internal:3306/warehouse"Supported URL schemes:
postgres://orpostgresql://— PostgreSQLmysql://— MySQL
Multiple agents
Section titled “Multiple agents”Deploy multiple agents for different databases or environments:
┌──────────────┐│ Server │└──┬───────┬───┘ │ │ ▼ ▼Agent A Agent B(prod) (staging + analytics)Agent A (prod-agent.toml):
agent_id = "prod-agent"
[server]url = "https://dbward.internal:3000"agent_token = "${DBWARD_AGENT_TOKEN}"
[databases.app.production]url = "postgres://...@prod-db:5432/app"Agent B (staging-agent.toml):
agent_id = "staging-agent"
[server]url = "https://dbward.internal:3000"agent_token = "${DBWARD_AGENT_TOKEN}"
[databases.app.staging]url = "postgres://...@staging-db:5432/app"
[databases.analytics.staging]url = "mysql://...@analytics:3306/warehouse"The server automatically routes jobs based on each agent’s [databases] keys.
Job execution flow
Section titled “Job execution flow”1. Agent polls: POST /api/agent/poll (with capabilities)2. Server returns a dispatched job (if any match)3. Agent claims: POST /api/agent/jobs/{id}/claim4. Agent executes the SQL against the target database5. Agent returns result: POST /api/agent/jobs/{id}/result6. Server relays result to the waiting clientThe agent sends periodic heartbeats during execution to extend the lease. If the agent crashes, the lease expires and the job is marked execution_lost.
Running with systemd
Section titled “Running with systemd”[Unit]Description=dbward agentAfter=network.target
[Service]Type=simpleUser=dbwardEnvironment=DBWARD_AGENT_TOKEN=dbw_...Environment=DATABASE_URL=postgres://...ExecStart=/usr/local/bin/dbward agent --config /etc/dbward/dbward-agent.tomlRestart=alwaysRestartSec=5
[Install]WantedBy=multi-user.targetRunning with Docker
Section titled “Running with Docker”docker run -d \ --name dbward-agent \ --network db-network \ -e DBWARD_AGENT_TOKEN=dbw_... \ -e DATABASE_URL=postgres://user:pass@db:5432/app \ -v ./dbward-agent.toml:/etc/dbward/dbward-agent.toml:ro \ ghcr.io/dbward-dev/dbward-agent:latest \ --config /etc/dbward/dbward-agent.tomlStartup and resilience
Section titled “Startup and resilience”On startup, the agent:
- Creates a liveness probe immediately (
/tmp/dbward-agent-alive) - Retries connecting to the server (fetch public key + initial poll)
- Retries connecting to each configured database
- Once all prerequisites pass, creates a readiness probe (
/tmp/dbward-agent-ready)
If the server or database is temporarily unavailable, the agent retries with exponential backoff (1s → 2s → 4s → 8s → 15s cap) instead of exiting. This prevents CrashLoopBackOff in Kubernetes when services start simultaneously.
Hard errors cause immediate exit (no retry):
- Authentication failures (wrong token, wrong DB password)
- Invalid configuration (unsupported URL scheme, 4xx from server)
Transient errors are retried:
- Connection refused, timeouts, DNS failures, 5xx responses
Runtime resilience
Section titled “Runtime resilience”During normal operation:
- If the server becomes unreachable, the agent stops accepting new jobs (readiness removed) but stays alive and retries
- If a database connection is lost during job execution, the agent enters degraded mode: stops accepting jobs, attempts reconnection every poll interval, and resumes when connectivity is restored
- SIGTERM/SIGINT triggers graceful shutdown at any phase (including during startup retries)
Server-side visibility: From the operator’s perspective, the server considers an agent offline if no poll is received for 60 seconds — regardless of the cause (process crash, network partition, or host failure). Query GET /api/agents (requires metrics.view permission) to check. See Server Health checks.
Configuration
Section titled “Configuration”startup_retry_initial_ms = 1000 # Initial retry delay (default: 1000)startup_retry_max_ms = 15000 # Max retry delay cap (default: 15000)startup_max_wait_secs = 60 # default 60s, 0 = retry foreverGraceful shutdown
Section titled “Graceful shutdown”On SIGTERM/SIGINT:
- Stops accepting new jobs
- Waits for in-flight jobs to complete (up to
drain_timeout_secs) - Exits cleanly
Security considerations
Section titled “Security considerations”- Least privilege: Create a dedicated database user for the agent with only the permissions needed (e.g., SELECT + DML, no DDL for production)
- Network isolation: The agent only needs outbound access to the server and the database. No inbound ports required.
- Token rotation: Rotate agent tokens periodically. Create a new token, update the agent config, restart, then revoke the old token.
- Environment variables: Use
${ENV_VAR}in TOML to avoid hardcoding credentials.
See also
Section titled “See also”- Server setup — Configure the server
- Authentication — Token management and OIDC
- Workflows — Configure approval rules
- Troubleshooting — Common deployment issues