Operations
Production deployment and monitoring.
Running as a Service
systemd
Create /etc/systemd/system/dagu.service:
[Unit]
Description=Dagu Workflow Engine
Documentation=https://docs.dagu.sh/
After=network.target
Wants=network-online.target
[Service]
Type=simple
User=dagu
Group=dagu
WorkingDirectory=/opt/dagu
# Main process
ExecStart=/usr/local/bin/dagu start-all
# Graceful shutdown
ExecStop=/bin/kill -TERM $MAINPID
TimeoutStopSec=30
KillMode=mixed
KillSignal=SIGTERM
# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=60
StartLimitBurst=3
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/dagu/data /opt/dagu/logs
# Resource limits
LimitNOFILE=65536
LimitNPROC=4096
# Environment
EnvironmentFile=-/etc/dagu/environment
Environment="DAGU_HOME=/opt/dagu"
[Install]
WantedBy=multi-user.targetCreate /etc/dagu/environment:
DAGU_HOST=0.0.0.0
DAGU_PORT=8080
DAGU_TZ=America/New_York
DAGU_LOG_FORMAT=jsonSetup:
# Create user and directories
sudo useradd -r -s /bin/false dagu
sudo mkdir -p /opt/dagu/{dags,data,logs}
sudo chown -R dagu:dagu /opt/dagu
# Enable and start
sudo systemctl enable dagu
sudo systemctl start dagu
# Check status
sudo systemctl status dagu
sudo journalctl -u dagu -fDocker Compose
compose.yml:
version: '3.8'
services:
dagu:
image: ghcr.io/dagucloud/dagu:latest
container_name: dagu
restart: unless-stopped
# Health check
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/api/v1/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
# Port mapping
ports:
- "8080:8080"
# Environment variables
environment:
# Server configuration
- DAGU_PORT=8080
- DAGU_HOST=0.0.0.0
- DAGU_TZ=America/New_York
# Logging
- DAGU_LOG_FORMAT=json
# Authentication (optional)
# - DAGU_AUTH_BASIC_USERNAME=admin
# - DAGU_AUTH_BASIC_PASSWORD=your-secure-password
# User/Group IDs (optional)
# - PUID=1000
# - PGID=1000
# Docker-in-Docker support (optional)
# - DOCKER_GID=999
# Volume mounts
volumes:
- dagu:/var/lib/dagu
# Docker socket for Docker executor (optional)
# - /var/run/docker.sock:/var/run/docker.sock
# Logging configuration
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
volumes:
dagu-data:
dagu-logs:# Start
docker compose up -d
# Logs
docker compose logs -f
# Stop
docker compose downWith authentication (.env file):
DAGU_AUTH_BASIC_USERNAME=admin
DAGU_AUTH_BASIC_PASSWORD=secure-passwordResource Monitoring
Dagu provides built-in resource monitoring that tracks CPU, memory, disk, and load average. The data is displayed in the System Status page of the web UI.
Configuration:
# config.yaml
monitoring:
retention: "24h" # How long to keep history (default: 24h)
interval: "5s" # Collection frequency (default: 5s)# Or via environment variables
export DAGU_MONITORING_RETENTION=12h
export DAGU_MONITORING_INTERVAL=10sMetrics collected:
- CPU Usage - Overall CPU utilization percentage
- Memory Usage - RAM utilization percentage
- Disk Usage - Disk space utilization for the data directory
- Load Average - 1-minute system load average
API Endpoint:
# Get resource history (last hour by default)
curl http://localhost:8080/api/v1/services/resources/history
# Get last 30 minutes
curl http://localhost:8080/api/v1/services/resources/history?duration=30mMemory Usage: Resource history is stored in memory. With default settings (5s interval, 24h retention), memory usage is approximately 1.1MB for all 4 metrics.
Prometheus Metrics
Metrics available at /api/v1/metrics:
System:
dagu_info- Build informationdagu_uptime_seconds- Uptimedagu_scheduler_running- Scheduler status
DAGs:
dagu_dags_total- Total DAGsdagu_dag_runs_currently_running- Running DAGsdagu_dag_runs_queued_total- Queued DAGsdagu_dag_runs_total- DAG runs by status (24h)
Workers:
dagu_workers_registered- Registered distributed workersdagu_worker_info- Worker heartbeat labels as key/value metadatadagu_worker_heartbeat_timestamp_seconds- Last worker heartbeat timestampdagu_worker_health_status- Worker health by heartbeat freshnessdagu_worker_pollers- Worker poller capacity by statedagu_worker_running_tasks- Running tasks per workerdagu_worker_oldest_running_task_age_seconds- Oldest running task age per worker
Standard:
- Go runtime metrics
- Process metrics
Logging
# config.yaml
log_format: json # text or json
debug: true # Debug mode
paths:
log_dir: /var/log/dagu# Or via environment
export DAGU_LOG_FORMAT=json
export DAGU_DEBUG=true
export DAGU_LOG_DIR=/var/log/daguJSON log example:
{
"time": "2024-03-15T12:00:00Z",
"level": "INFO",
"msg": "DAG execution started",
"dag": "data-pipeline",
"run_id": "20240315_120000_abc123"
}Log Cleanup
Automatic cleanup based on hist_retention_days:
# Per-DAG
hist_retention_days: 7 # Keep 7 days
# Or global in base.yaml
hist_retention_days: 30 # DefaultNotes:
hist_retention_daysdefaults to30.hist_retention_days: 0also uses the default30-day retention.- Negative values disable automatic cleanup.
Deletes:
- Execution logs
- Step output (.out, .err)
- Status files (.jsonl)
- Sub DAG logs
Viewing Run History
Before cleaning up logs, review execution history with dagu history:
# Preview what cleanup would affect
dagu history my-workflow --limit 100
# Check run status before deletion
dagu history my-workflow --from 2025-01-01 --to 2025-12-31The history command helps:
- Identify which runs to keep/delete
- Verify cleanup results
- Export run metadata before cleanup:
dagu history --format json
See history CLI reference and cleanup command.
Alerting
For team-managed notification routes, use Notifications in the Web UI. It supports reusable Slack, email, Telegram, and webhook channels with Global and workspace rules. For managed provider incidents, use Incident Routing. On self-hosted Dagu, these Web UI routing features require an active license or trial.
Email
# base.yaml
smtp:
host: "smtp.gmail.com"
port: "587"
username: "notifications@company.com"
password: "${SMTP_PASSWORD}"
error_mail:
from: "dagu@company.com"
to: "ops-team@company.com"
prefix: "[ERROR]"
attach_logs: true
mail_on:
failure: true
success: falsePer-step notification:
steps:
- id: critical_task
run: echo "Processing"
mail_on_error: trueWebhooks
Slack:
handler_on:
failure:
action: http.request
with:
url: "${SLACK_WEBHOOK_URL}"
method: POST
body: |
{
"text": "Workflow Failed: ${DAG_NAME}",
"blocks": [{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Run ID:* ${DAG_RUN_ID}"
}
}]
}For managed provider incidents that open after final failure and resolve on recovery, prefer Incident Routing. The YAML pattern below is still useful when the provider call should travel with one workflow definition.
PagerDuty:
handler_on:
failure:
action: http.request
with:
url: https://events.pagerduty.com/v2/enqueue
body: |
{
"routing_key": "${PAGERDUTY_KEY}",
"event_action": "trigger",
"payload": {
"summary": "Failed: ${DAG_NAME}",
"severity": "error"
}
}Security
Environment Variable Filtering
Dagu filters the process environment before it builds the step execution environment and before it starts sub-DAG executions.
System environment variables are still available for ${VAR} expansion when Dagu parses the DAG.
Built-in forwarded variables:
- Unix and macOS exact names:
PATH,HOME,USER,SHELL,TMPDIR,TERM,EDITOR,VISUAL,LANG,LC_ALL,LC_CTYPE,TZ,LD_LIBRARY_PATH,XDG_CONFIG_HOME,XDG_DATA_HOME,XDG_CACHE_HOME,DOCKER_HOST,DOCKER_TLS_VERIFY,DOCKER_CERT_PATH,DOCKER_API_VERSION - Windows exact names:
USERPROFILE,SYSTEMROOT,WINDIR,SYSTEMDRIVE,COMSPEC,PATHEXT,TEMP,TMP,PATH,PSMODULEPATH,HOME,DOCKER_HOST,DOCKER_TLS_VERIFY,DOCKER_CERT_PATH,DOCKER_API_VERSION - Prefixes on all platforms:
DAGU_,DAG_,LC_,KUBERNETES_
Dagu also sets runtime variables with the DAG_ prefix for each step execution:
DAG_NAME,DAG_RUN_ID,DAG_RUN_STEP_NAMEDAG_RUN_LOG_FILE,DAG_RUN_STEP_STDOUT_FILE,DAG_RUN_STEP_STDERR_FILE
You can extend the forwarded set in Dagu configuration:
env_passthrough:
- SSL_CERT_FILE
- HTTP_PROXY
- HTTPS_PROXY
- NO_PROXY
env_passthrough_prefixes:
- AWS_Or with environment variables:
export DAGU_ENV_PASSTHROUGH=SSL_CERT_FILE,HTTP_PROXY,HTTPS_PROXY,NO_PROXY
export DAGU_ENV_PASSTHROUGH_PREFIXES=AWS_These settings only forward matching variables that already exist in the Dagu process environment. On Unix, matching is case-sensitive. On Windows, matching is case-insensitive.
If a variable is not forwarded automatically, you can still make it available to the step by defining it explicitly in the workflow.
Use .env or env: when you want non-sensitive step environment contents to be explicit. For credentials and other secrets, prefer the secrets: block so values are resolved at runtime and masked in logs:
# workflow.yaml
dotenv: .env.runtime # Load non-sensitive defaults from a file
secrets:
- name: AWS_ACCESS_KEY_ID
provider: env
key: PROD_AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
provider: env
key: PROD_AWS_SECRET_ACCESS_KEY
- name: DATABASE_PASSWORD
provider: file
key: /run/secrets/db-password
steps:
- id: deploy
run: aws s3 sync ./build s3://my-bucket# .env.runtime (safe to keep local; do not store secrets here)
AWS_REGION=ap-northeast-1
S3_BUCKET=my-bucketIf you need to copy a non-sensitive process variable into workflow env:, do it explicitly:
# workflow.yaml
env:
- AWS_REGION: ${AWS_REGION}
- S3_BUCKET: ${S3_BUCKET}
steps:
- id: deploy
run: aws s3 sync ./build s3://${S3_BUCKET}See Secrets for provider details and masking behavior.
Process Isolation
# systemd security features
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/dagu/data /opt/dagu/logsSee Also
- Server Configuration - Configure server settings
- Deployment - Installation and deployment guides
- Reference - Complete configuration reference
- Variables Reference - Environment variable usage
