Skip to content

Operations

Production deployment and monitoring.

Running as a Service

systemd

Create /etc/systemd/system/dagu.service:

ini
[Unit]
Description=Dagu Workflow Engine
Documentation=https://docs.dagu.sh/
After=network.target
Wants=network-online.target

[Service]
Type=simple
User=dagu
Group=dagu
WorkingDirectory=/opt/dagu

# Main process
ExecStart=/usr/local/bin/dagu start-all

# Graceful shutdown
ExecStop=/bin/kill -TERM $MAINPID
TimeoutStopSec=30
KillMode=mixed
KillSignal=SIGTERM

# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=60
StartLimitBurst=3

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/dagu/data /opt/dagu/logs

# Resource limits
LimitNOFILE=65536
LimitNPROC=4096

# Environment
EnvironmentFile=-/etc/dagu/environment
Environment="DAGU_HOME=/opt/dagu"

[Install]
WantedBy=multi-user.target

Create /etc/dagu/environment:

bash
DAGU_HOST=0.0.0.0
DAGU_PORT=8080
DAGU_TZ=America/New_York
DAGU_LOG_FORMAT=json

Setup:

bash
# Create user and directories
sudo useradd -r -s /bin/false dagu
sudo mkdir -p /opt/dagu/{dags,data,logs}
sudo chown -R dagu:dagu /opt/dagu

# Enable and start
sudo systemctl enable dagu
sudo systemctl start dagu

# Check status
sudo systemctl status dagu
sudo journalctl -u dagu -f

Docker Compose

compose.yml:

yaml
version: '3.8'

services:
  dagu:
    image: ghcr.io/dagucloud/dagu:latest
    container_name: dagu
    restart: unless-stopped
    
    # Health check
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/api/v1/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    
    # Port mapping
    ports:
      - "8080:8080"
    
    # Environment variables
    environment:
      # Server configuration
      - DAGU_PORT=8080
      - DAGU_HOST=0.0.0.0
      - DAGU_TZ=America/New_York
      
      # Logging
      - DAGU_LOG_FORMAT=json
      
      # Authentication (optional)
      # - DAGU_AUTH_BASIC_USERNAME=admin
      # - DAGU_AUTH_BASIC_PASSWORD=your-secure-password
      
      # User/Group IDs (optional)
      # - PUID=1000
      # - PGID=1000
      
      # Docker-in-Docker support (optional)
      # - DOCKER_GID=999
    
    # Volume mounts
    volumes:
      - dagu:/var/lib/dagu
      
      # Docker socket for Docker executor (optional)
      # - /var/run/docker.sock:/var/run/docker.sock
    
    # Logging configuration
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"

volumes:
  dagu-data:
  dagu-logs:
bash
# Start
docker compose up -d

# Logs
docker compose logs -f

# Stop
docker compose down

With authentication (.env file):

bash
DAGU_AUTH_BASIC_USERNAME=admin
DAGU_AUTH_BASIC_PASSWORD=secure-password

Resource Monitoring

Dagu provides built-in resource monitoring that tracks CPU, memory, disk, and load average. The data is displayed in the System Status page of the web UI.

Configuration:

yaml
# config.yaml
monitoring:
  retention: "24h"    # How long to keep history (default: 24h)
  interval: "5s"      # Collection frequency (default: 5s)
bash
# Or via environment variables
export DAGU_MONITORING_RETENTION=12h
export DAGU_MONITORING_INTERVAL=10s

Metrics collected:

  • CPU Usage - Overall CPU utilization percentage
  • Memory Usage - RAM utilization percentage
  • Disk Usage - Disk space utilization for the data directory
  • Load Average - 1-minute system load average

API Endpoint:

bash
# Get resource history (last hour by default)
curl http://localhost:8080/api/v1/services/resources/history

# Get last 30 minutes
curl http://localhost:8080/api/v1/services/resources/history?duration=30m

Memory Usage: Resource history is stored in memory. With default settings (5s interval, 24h retention), memory usage is approximately 1.1MB for all 4 metrics.

Prometheus Metrics

Metrics available at /api/v1/metrics:

System:

  • dagu_info - Build information
  • dagu_uptime_seconds - Uptime
  • dagu_scheduler_running - Scheduler status

DAGs:

  • dagu_dags_total - Total DAGs
  • dagu_dag_runs_currently_running - Running DAGs
  • dagu_dag_runs_queued_total - Queued DAGs
  • dagu_dag_runs_total - DAG runs by status (24h)

Workers:

  • dagu_workers_registered - Registered distributed workers
  • dagu_worker_info - Worker heartbeat labels as key/value metadata
  • dagu_worker_heartbeat_timestamp_seconds - Last worker heartbeat timestamp
  • dagu_worker_health_status - Worker health by heartbeat freshness
  • dagu_worker_pollers - Worker poller capacity by state
  • dagu_worker_running_tasks - Running tasks per worker
  • dagu_worker_oldest_running_task_age_seconds - Oldest running task age per worker

Standard:

  • Go runtime metrics
  • Process metrics

Logging

yaml
# config.yaml
log_format: json    # text or json
debug: true       # Debug mode
paths:
  log_dir: /var/log/dagu
bash
# Or via environment
export DAGU_LOG_FORMAT=json
export DAGU_DEBUG=true
export DAGU_LOG_DIR=/var/log/dagu

JSON log example:

json
{
  "time": "2024-03-15T12:00:00Z",
  "level": "INFO",
  "msg": "DAG execution started",
  "dag": "data-pipeline",
  "run_id": "20240315_120000_abc123"
}

Log Cleanup

Automatic cleanup based on hist_retention_days:

yaml
# Per-DAG
hist_retention_days: 7  # Keep 7 days

# Or global in base.yaml
hist_retention_days: 30  # Default

Notes:

  • hist_retention_days defaults to 30.
  • hist_retention_days: 0 also uses the default 30-day retention.
  • Negative values disable automatic cleanup.

Deletes:

  • Execution logs
  • Step output (.out, .err)
  • Status files (.jsonl)
  • Sub DAG logs

Viewing Run History

Before cleaning up logs, review execution history with dagu history:

bash
# Preview what cleanup would affect
dagu history my-workflow --limit 100

# Check run status before deletion
dagu history my-workflow --from 2025-01-01 --to 2025-12-31

The history command helps:

  • Identify which runs to keep/delete
  • Verify cleanup results
  • Export run metadata before cleanup: dagu history --format json

See history CLI reference and cleanup command.

Alerting

For team-managed notification routes, use Notifications in the Web UI. It supports reusable Slack, email, Telegram, and webhook channels with Global and workspace rules. For managed provider incidents, use Incident Routing. On self-hosted Dagu, these Web UI routing features require an active license or trial.

Email

yaml
# base.yaml
smtp:
  host: "smtp.gmail.com"
  port: "587"
  username: "notifications@company.com"
  password: "${SMTP_PASSWORD}"

error_mail:
  from: "dagu@company.com"
  to: "ops-team@company.com"
  prefix: "[ERROR]"
  attach_logs: true

mail_on:
  failure: true
  success: false

Per-step notification:

yaml
steps:
  - id: critical_task
    run: echo "Processing"
    mail_on_error: true

Webhooks

Slack:

yaml
handler_on:
  failure:
    action: http.request
    with:
      url: "${SLACK_WEBHOOK_URL}"
      method: POST
      body: |
        {
          "text": "Workflow Failed: ${DAG_NAME}",
          "blocks": [{
            "type": "section",
            "text": {
              "type": "mrkdwn",
              "text": "*Run ID:* ${DAG_RUN_ID}"
            }
          }]
        }

For managed provider incidents that open after final failure and resolve on recovery, prefer Incident Routing. The YAML pattern below is still useful when the provider call should travel with one workflow definition.

PagerDuty:

yaml
handler_on:
  failure:
    action: http.request
    with:
      url: https://events.pagerduty.com/v2/enqueue
      body: |
        {
          "routing_key": "${PAGERDUTY_KEY}",
          "event_action": "trigger",
          "payload": {
            "summary": "Failed: ${DAG_NAME}",
            "severity": "error"
          }
        }

Security

Environment Variable Filtering

Dagu filters the process environment before it builds the step execution environment and before it starts sub-DAG executions.

System environment variables are still available for ${VAR} expansion when Dagu parses the DAG.

Built-in forwarded variables:

  • Unix and macOS exact names: PATH, HOME, USER, SHELL, TMPDIR, TERM, EDITOR, VISUAL, LANG, LC_ALL, LC_CTYPE, TZ, LD_LIBRARY_PATH, XDG_CONFIG_HOME, XDG_DATA_HOME, XDG_CACHE_HOME, DOCKER_HOST, DOCKER_TLS_VERIFY, DOCKER_CERT_PATH, DOCKER_API_VERSION
  • Windows exact names: USERPROFILE, SYSTEMROOT, WINDIR, SYSTEMDRIVE, COMSPEC, PATHEXT, TEMP, TMP, PATH, PSMODULEPATH, HOME, DOCKER_HOST, DOCKER_TLS_VERIFY, DOCKER_CERT_PATH, DOCKER_API_VERSION
  • Prefixes on all platforms: DAGU_, DAG_, LC_, KUBERNETES_

Dagu also sets runtime variables with the DAG_ prefix for each step execution:

  • DAG_NAME, DAG_RUN_ID, DAG_RUN_STEP_NAME
  • DAG_RUN_LOG_FILE, DAG_RUN_STEP_STDOUT_FILE, DAG_RUN_STEP_STDERR_FILE

You can extend the forwarded set in Dagu configuration:

yaml
env_passthrough:
  - SSL_CERT_FILE
  - HTTP_PROXY
  - HTTPS_PROXY
  - NO_PROXY

env_passthrough_prefixes:
  - AWS_

Or with environment variables:

bash
export DAGU_ENV_PASSTHROUGH=SSL_CERT_FILE,HTTP_PROXY,HTTPS_PROXY,NO_PROXY
export DAGU_ENV_PASSTHROUGH_PREFIXES=AWS_

These settings only forward matching variables that already exist in the Dagu process environment. On Unix, matching is case-sensitive. On Windows, matching is case-insensitive.

If a variable is not forwarded automatically, you can still make it available to the step by defining it explicitly in the workflow.

Use .env or env: when you want non-sensitive step environment contents to be explicit. For credentials and other secrets, prefer the secrets: block so values are resolved at runtime and masked in logs:

yaml
# workflow.yaml
dotenv: .env.runtime  # Load non-sensitive defaults from a file

secrets:
  - name: AWS_ACCESS_KEY_ID
    provider: env
    key: PROD_AWS_ACCESS_KEY_ID
  - name: AWS_SECRET_ACCESS_KEY
    provider: env
    key: PROD_AWS_SECRET_ACCESS_KEY
  - name: DATABASE_PASSWORD
    provider: file
    key: /run/secrets/db-password

steps:
  - id: deploy
    run: aws s3 sync ./build s3://my-bucket
bash
# .env.runtime (safe to keep local; do not store secrets here)
AWS_REGION=ap-northeast-1
S3_BUCKET=my-bucket

If you need to copy a non-sensitive process variable into workflow env:, do it explicitly:

yaml
# workflow.yaml
env:
  - AWS_REGION: ${AWS_REGION}
  - S3_BUCKET: ${S3_BUCKET}

steps:
  - id: deploy
    run: aws s3 sync ./build s3://${S3_BUCKET}

See Secrets for provider details and masking behavior.

Process Isolation

bash
# systemd security features
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/dagu/data /opt/dagu/logs

See Also

Released under the MIT License.