Operations

Production deployment and monitoring.

Running as a Service

systemd

Create /etc/systemd/system/dagu.service:

ini

[Unit]
Description=Dagu Workflow Engine
Documentation=https://docs.dagu.sh/
After=network.target
Wants=network-online.target

[Service]
Type=simple
User=dagu
Group=dagu
WorkingDirectory=/opt/dagu

# Main process
ExecStart=/usr/local/bin/dagu start-all

# Graceful shutdown
ExecStop=/bin/kill -TERM $MAINPID
TimeoutStopSec=30
KillMode=mixed
KillSignal=SIGTERM

# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=60
StartLimitBurst=3

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/dagu/data /opt/dagu/logs

# Resource limits
LimitNOFILE=65536
LimitNPROC=4096

# Environment
EnvironmentFile=-/etc/dagu/environment
Environment="DAGU_HOME=/opt/dagu"

[Install]
WantedBy=multi-user.target

Create /etc/dagu/environment:

bash

DAGU_HOST=0.0.0.0
DAGU_PORT=8080
DAGU_TZ=America/New_York
DAGU_LOG_FORMAT=json

Setup:

bash

# Create user and directories
sudo useradd -r -s /bin/false dagu
sudo mkdir -p /opt/dagu/{dags,data,logs}
sudo chown -R dagu:dagu /opt/dagu

# Enable and start
sudo systemctl enable dagu
sudo systemctl start dagu

# Check status
sudo systemctl status dagu
sudo journalctl -u dagu -f

Docker Compose

compose.yml:

yaml

version: '3.8'

services:
  dagu:
    image: ghcr.io/dagucloud/dagu:latest
    container_name: dagu
    restart: unless-stopped
    
    # Health check
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/api/v1/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    
    # Port mapping
    ports:
      - "8080:8080"
    
    # Environment variables
    environment:
      # Server configuration
      - DAGU_PORT=8080
      - DAGU_HOST=0.0.0.0
      - DAGU_TZ=America/New_York
      
      # Logging
      - DAGU_LOG_FORMAT=json
      
      # Authentication (optional)
      # - DAGU_AUTH_BASIC_USERNAME=admin
      # - DAGU_AUTH_BASIC_PASSWORD=your-secure-password
      
      # User/Group IDs (optional)
      # - PUID=1000
      # - PGID=1000
      
      # Docker-in-Docker support (optional)
      # - DOCKER_GID=999
    
    # Volume mounts
    volumes:
      - dagu:/var/lib/dagu
      
      # Docker socket for Docker executor (optional)
      # - /var/run/docker.sock:/var/run/docker.sock
    
    # Logging configuration
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"

volumes:
  dagu-data:
  dagu-logs:

bash

# Start
docker compose up -d

# Logs
docker compose logs -f

# Stop
docker compose down

With authentication (.env file):

bash

DAGU_AUTH_BASIC_USERNAME=admin
DAGU_AUTH_BASIC_PASSWORD=secure-password

Resource Monitoring

Dagu provides built-in resource monitoring that tracks CPU, memory, disk, and load average. The data is displayed in the System Status page of the web UI.

Configuration:

yaml

# config.yaml
monitoring:
  retention: "24h"    # How long to keep history (default: 24h)
  interval: "5s"      # Collection frequency (default: 5s)

bash

# Or via environment variables
export DAGU_MONITORING_RETENTION=12h
export DAGU_MONITORING_INTERVAL=10s

Metrics collected:

CPU Usage - Overall CPU utilization percentage
Memory Usage - RAM utilization percentage
Disk Usage - Disk space utilization for the data directory
Load Average - 1-minute system load average

API Endpoint:

bash

# Get resource history (last hour by default)
curl http://localhost:8080/api/v1/services/resources/history

# Get last 30 minutes
curl http://localhost:8080/api/v1/services/resources/history?duration=30m

Memory Usage: Resource history is stored in memory. With default settings (5s interval, 24h retention), memory usage is approximately 1.1MB for all 4 metrics.

Prometheus Metrics

Metrics available at /api/v1/metrics:

System:

dagu_info - Build information
dagu_uptime_seconds - Uptime
dagu_scheduler_running - Scheduler status

DAGs:

dagu_dags_total - Total DAGs
dagu_dag_runs_currently_running - Running DAGs
dagu_dag_runs_queued_total - Queued DAGs
dagu_dag_runs_total - DAG runs by status (24h)

Workers:

dagu_workers_registered - Registered distributed workers
dagu_worker_info - Worker heartbeat labels as key/value metadata
dagu_worker_heartbeat_timestamp_seconds - Last worker heartbeat timestamp
dagu_worker_health_status - Worker health by heartbeat freshness
dagu_worker_pollers - Worker poller capacity by state
dagu_worker_running_tasks - Running tasks per worker
dagu_worker_oldest_running_task_age_seconds - Oldest running task age per worker

Standard:

Go runtime metrics
Process metrics

Logging

yaml

# config.yaml
log_format: json    # text or json
debug: true       # Debug mode
paths:
  log_dir: /var/log/dagu

bash

# Or via environment
export DAGU_LOG_FORMAT=json
export DAGU_DEBUG=true
export DAGU_LOG_DIR=/var/log/dagu

JSON log example:

json

{
  "time": "2024-03-15T12:00:00Z",
  "level": "INFO",
  "msg": "DAG execution started",
  "dag": "data-pipeline",
  "run_id": "20240315_120000_abc123"
}

Log Cleanup

Automatic cleanup based on hist_retention_days:

yaml

# Per-DAG
hist_retention_days: 7  # Keep 7 days

# Or global in base.yaml
hist_retention_days: 30  # Default

Notes:

hist_retention_days defaults to 30.
hist_retention_days: 0 also uses the default 30-day retention.
Negative values disable automatic cleanup.

Deletes:

Execution logs
Step output (.out, .err)
Status files (.jsonl)
Sub DAG logs
Run artifact directories

For manual cleanup, use dagu rm --history:

bash

# Preview all removable history
dagu rm --history --dry-run my-workflow

# Remove history older than 30 days
dagu rm -H --older-than 30d my-workflow

# Remove history and its YAML definition together
dagu rm -H --definition my-workflow.yaml

The command preserves active runs and prompts before deleting unless --force is set. Definition removal is refused while the DAG has an active local or distributed run.

Viewing Run History

Before cleaning up logs, review execution history with dagu history or preview the exact deletion with dagu rm --dry-run:

bash

# Inspect recent history
dagu history my-workflow --limit 100

# Check run status before deletion
dagu history my-workflow --from 2025-01-01 --to 2025-12-31

# Preview history older than 30 days
dagu rm -H -t 30d --dry-run my-workflow

The history command helps:

Identify which runs to keep/delete
Verify cleanup results
Export run metadata before cleanup: dagu history --format json

See the history CLI reference and rm command. The older cleanup command remains available as a deprecated history-only alias.

Alerting

For team-managed notification routes, use Notifications in the Web UI. It supports reusable Slack, email, Telegram, and webhook channels with Global and workspace rules.

For managed provider incidents, use Incident Routing. On self-hosted Dagu, incident routing requires an active license or trial.

Email

yaml

# base.yaml
smtp:
  host: "smtp.gmail.com"
  port: "587"
  username: "notifications@company.com"
  password: "${env.SMTP_PASSWORD}"

error_mail:
  from: "dagu@company.com"
  to: "ops-team@company.com"
  prefix: "[ERROR]"
  attach_logs: true

mail_on:
  failure: true
  success: false

Per-step notification:

yaml

steps:
  - id: critical_task
    run: echo "Processing"
    mail_on_error: true

Webhooks

Slack:

yaml

handler_on:
  failure:
    action: http.request
    with:
      url: "${env.SLACK_WEBHOOK_URL}"
      method: POST
      body: |
        {
          "text": "Workflow Failed: ${context.dag.name}",
          "blocks": [{
            "type": "section",
            "text": {
              "type": "mrkdwn",
              "text": "*Run ID:* ${context.run.id}"
            }
          }]
        }

For managed provider incidents that open after final failure and resolve on recovery, prefer Incident Routing. The YAML pattern below is still useful when the provider call should travel with one workflow definition.

PagerDuty:

yaml

handler_on:
  failure:
    action: http.request
    with:
      url: https://events.pagerduty.com/v2/enqueue
      body: |
        {
          "routing_key": "${env.PAGERDUTY_KEY}",
          "event_action": "trigger",
          "payload": {
            "summary": "Failed: ${context.dag.name}",
            "severity": "error"
          }
        }

Security

Environment Variable Filtering

Dagu filters the process environment before it builds the step execution environment and before it starts sub-DAG executions.

System environment variables are still available for ${VAR} expansion when Dagu parses the DAG.

Built-in forwarded variables:

Unix and macOS exact names: PATH, HOME, USER, SHELL, TMPDIR, TERM, EDITOR, VISUAL, LANG, LC_ALL, LC_CTYPE, TZ, LD_LIBRARY_PATH, XDG_CONFIG_HOME, XDG_DATA_HOME, XDG_CACHE_HOME, DOCKER_HOST, DOCKER_TLS_VERIFY, DOCKER_CERT_PATH, DOCKER_API_VERSION
Windows exact names: USERPROFILE, SYSTEMROOT, WINDIR, SYSTEMDRIVE, COMSPEC, PATHEXT, TEMP, TMP, PATH, PSMODULEPATH, HOME, DOCKER_HOST, DOCKER_TLS_VERIFY, DOCKER_CERT_PATH, DOCKER_API_VERSION
Prefixes on all platforms: DAGU_, DAG_, LC_, KUBERNETES_

Dagu-managed runtime metadata is available through canonical ${context.*} references in value-resolved workflow fields. Dagu also projects selected values with the DAG_ prefix for each step execution:

DAG_NAME, DAG_RUN_ID, DAG_RUN_STEP_NAME
DAG_RUN_LOG_FILE, DAG_RUN_STEP_STDOUT_FILE, DAG_RUN_STEP_STDERR_FILE

You can extend the forwarded set in Dagu configuration:

yaml

env_passthrough:
  - SSL_CERT_FILE
  - HTTP_PROXY
  - HTTPS_PROXY
  - NO_PROXY

env_passthrough_prefixes:
  - AWS_

Or with environment variables:

bash

export DAGU_ENV_PASSTHROUGH=SSL_CERT_FILE,HTTP_PROXY,HTTPS_PROXY,NO_PROXY
export DAGU_ENV_PASSTHROUGH_PREFIXES=AWS_

These settings only forward matching variables that already exist in the Dagu process environment. On Unix, matching is case-sensitive. On Windows, matching is case-insensitive.

If a variable is not forwarded automatically, you can still make it available to the step by defining it explicitly in the workflow.

Use .env or env: when you want non-sensitive step environment contents to be explicit. For credentials and other secrets, prefer the secrets: block so values are resolved at runtime and masked in logs:

yaml

# workflow.yaml
dotenv: .env.runtime  # Load non-sensitive defaults from a file

secrets:
  - name: AWS_ACCESS_KEY_ID
    provider: env
    key: PROD_AWS_ACCESS_KEY_ID
  - name: AWS_SECRET_ACCESS_KEY
    provider: env
    key: PROD_AWS_SECRET_ACCESS_KEY
  - name: DATABASE_PASSWORD
    provider: file
    key: /run/secrets/db-password

steps:
  - id: deploy
    run: aws s3 sync ./build s3://my-bucket

bash

# .env.runtime (safe to keep local; do not store secrets here)
AWS_REGION=ap-northeast-1
S3_BUCKET=my-bucket

If you need to copy a non-sensitive process variable into workflow env:, do it explicitly:

yaml

# workflow.yaml
env:
  - AWS_REGION: ${AWS_REGION}
  - S3_BUCKET: ${S3_BUCKET}

steps:
  - id: deploy
    run: aws s3 sync ./build s3://${env.S3_BUCKET}

See Secrets for provider details and masking behavior.

Process Isolation

bash

# systemd security features
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/dagu/data /opt/dagu/logs

Installation

Clients

Chat & LLM

Harness

Agents

Sandboxed Execution

Examples

AI Examples

Secrets

Execution

Data & Files

Integrations

LLM

SQL

Harness

Agents

Sandboxed Execution

Dagu Actions

Official Actions

Authentication

Deployment

Distributed Execution

Workers

Operations

Running as a Service

systemd

Docker Compose

Resource Monitoring

Prometheus Metrics

Logging

Log Cleanup

Viewing Run History

Alerting

Email

Webhooks

Security

Environment Variable Filtering

Process Isolation

See Also

Chat & LLM

Harness

Agents

Sandboxed Execution

AI Examples

LLM

SQL

Harness

Agents

Sandboxed Execution

Official Actions

Workers

Operations ​

Running as a Service ​

systemd ​

Docker Compose ​

Resource Monitoring ​

Prometheus Metrics ​

Logging ​

Log Cleanup ​

Viewing Run History ​

Alerting ​

Email ​

Webhooks ​

Security ​

Environment Variable Filtering ​

Process Isolation ​

See Also ​

Operations

Running as a Service

systemd

Docker Compose

Resource Monitoring

Prometheus Metrics

Logging

Log Cleanup

Viewing Run History

Alerting

Email

Webhooks

Security

Environment Variable Filtering

Process Isolation

See Also