Skip to content
Dagu

Workflow Orchestration Engine

Single binary. No external dependencies. Scales from standalone to distributed cluster over gRPC.

Cockpit demo

Try It Live

Explore without installing: Live Demo

Credentials: demouser / demouser

What Dagu Does

Dagu is a workflow orchestration engine that runs as a single binary with no external databases or message brokers. Workflows are defined as DAGs (Directed Acyclic Graphs) in YAML. It supports local execution, cron scheduling, queue-based concurrency control, and distributed coordinator/worker execution across multiple machines over gRPC.

All state is stored in local files by default. There is nothing to install besides the binary itself.

Production Use Cases

Data pipeline orchestration. Define ETL/ELT workflows as DAGs with parallel and sequential steps. Use the built-in SQL executor to query PostgreSQL or SQLite, the S3 executor to move files to/from object storage, the jq executor for JSON transformation, and sub-DAG composition to break large pipelines into reusable stages. Steps can pass outputs to downstream steps via environment variables.

Infrastructure automation. Run commands on remote machines via the SSH executor with key-based authentication. Execute containers via the Docker or Kubernetes executor. Use preconditions to gate steps on environment checks, and lifecycle hooks (onSuccess, onFailure, onExit) to handle cleanup or notifications.

Scheduled job management. Replace crontab with DAGs that have cron scheduling, timezone support, retry policies, overlap control (skip, all, latest), and a web UI showing execution history, logs, and real-time status. Zombie detection automatically identifies and handles stalled runs.

Batch processing at scale. Distribute compute-heavy workloads across a pool of workers using the coordinator/worker architecture. Workers connect to a coordinator over gRPC, pull tasks from a queue, and report status back. Workers support label-based routing (e.g., gpu=true, region=us-east-1) so DAGs target specific machine capabilities.

Legacy script orchestration. Wrap existing shell scripts, Python scripts, HTTP calls, or any executable into workflow steps without modifying them. Dagu orchestrates execution order, captures stdout/stderr per step, and handles retries and error propagation around your existing code.

Architecture

Dagu runs in three configurations:

Standalone. A single dagu start-all process runs the HTTP server, scheduler, and executor. Suitable for single-machine deployments.

Coordinator/Worker. The scheduler enqueues jobs to a file-based queue, then dispatches them to a coordinator over gRPC. Workers long-poll the coordinator for tasks, execute DAGs locally, and report status back. Workers can run on separate machines and are routed tasks based on labels. Mutual TLS secures gRPC communication between coordinator and workers.

Headless. Run without the web UI (DAGU_HEADLESS=true). Useful for CI/CD environments or when Dagu is managed through the CLI or API only.

Standalone:

  ┌─────────────────────────────────────────┐
  │  dagu start-all                         │
  │  ┌───────────┐ ┌───────────┐ ┌────────┐│
  │  │ HTTP / UI │ │ Scheduler │ │Executor││
  │  └───────────┘ └───────────┘ └────────┘│
  │  File-based storage (logs, state, queue)│
  └─────────────────────────────────────────┘

Distributed:

  ┌────────────┐                   ┌────────────┐
  │ Scheduler  │                   │ HTTP / UI  │
  │            │                   │            │
  │ ┌────────┐ │                   └─────┬──────┘
  │ │ Queue  │ │  Dispatch (gRPC)        │
  │ │(file)  │ │─────────┐               │
  │ └────────┘ │         │               │
  └────────────┘         ▼               ▼
                    ┌─────────────────────────┐
                    │      Coordinator        │
                    │  (gRPC task dispatch,   │
                    │   worker registry,      │
                    │   health monitoring)    │
                    └────────┬────────────────┘

                   Poll (gRPC long-polling)

               ┌─────────────┼─────────────┐
               │             │             │
          ┌────▼───┐    ┌────▼───┐    ┌────▼───┐
          │Worker 1│    │Worker 2│    │Worker N│
          └────┬───┘    └────┬───┘    └────┬───┘
               │             │             │
               └─────────────┴─────────────┘
                 Heartbeat / ReportStatus /
                 StreamLogs (gRPC)

Quick Start

Install

bash
curl -fsSL https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.sh | bash
powershell
irm https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.ps1 | iex
bash
docker run --rm -v ~/.dagu:/var/lib/dagu -p 8080:8080 ghcr.io/dagucloud/dagu:latest dagu start-all
bash
brew install dagu
bash
helm repo add dagu https://dagucloud.github.io/dagu
helm repo update
helm install dagu dagu/dagu --set persistence.storageClass=<your-rwx-storage-class>

The script installers run a guided wizard that installs Dagu, adds it to your PATH, sets up a background service, and creates the initial admin account. Homebrew, Docker, and Helm install without the wizard. See the Installation Guide for all options.

Create and Run a Workflow

bash
cat > hello.yaml << 'EOF'
steps:
  - command: echo "Hello from Dagu!"
  - command: echo "Step 2"
EOF

dagu start hello.yaml

Start the Server

bash
dagu start-all

Visit http://localhost:8080

Built-in Executors

Dagu includes 18 built-in step executors. Each runs within the Dagu process (or worker). No plugins or external runtimes required.

ExecutorPurpose
commandShell commands and scripts (bash, sh, PowerShell, custom shells)
dockerRun containers with registry auth, volume mounts, resource limits
kubernetesExecute Kubernetes Pods with resource requests, service accounts, namespaces
sshRemote command execution with key-based auth and SFTP file transfer
httpHTTP requests (GET, POST, PUT, DELETE) with headers and authentication
sqlQuery PostgreSQL and SQLite with parameterized queries and result capture
redisRedis commands, pipelines, and Lua scripts
s3Upload, download, list, and delete S3 objects
jqJSON transformation using jq expressions
mailSend email via SMTP
archiveCreate zip/tar archives with glob patterns
dagInvoke another DAG as a sub-workflow with parameter passing
routerConditional step routing based on expressions
templateText generation with template rendering
chatLLM inference (OpenAI, Anthropic, Google Gemini, OpenRouter)
agentstepMulti-step LLM agent execution with tool calling
harnessRun coding agent CLIs (Claude Code, Codex, Copilot, OpenCode, Pi) as workflow steps

See Step Types for configuration details of each executor.

Scheduling and Reliability

FeatureDetails
Cron schedulingTimezone support, multiple schedule entries per DAG
Overlap policiesskip (default), all (queue all), latest (keep only the most recent)
Catch-up schedulingAutomatically runs missed intervals when the scheduler was down
Zombie detectionIdentifies and handles stalled DAG runs (configurable interval, default 45s)
Retry policiesPer-step retry with configurable limits, intervals, exit code filtering, exponential/linear/constant backoff
Lifecycle hooksonInit, onSuccess, onFailure, onAbort, onExit, onWait
PreconditionsGate DAG or step execution on shell command results
Queue systemFile-based persistent queue with configurable concurrency limits per queue
Scheduler HALock with stale detection for failover across multiple scheduler instances

Security and Access Control

Authentication

Four authentication modes, configured via DAGU_AUTH_MODE:

ModeDescription
noneNo authentication
basicHTTP Basic authentication
builtinJWT-based authentication with user management, API keys, and per-DAG webhook tokens
OIDCOpenID Connect integration with any compliant identity provider

Role-Based Access Control

When using builtin auth, five roles control access:

RoleCapabilities
adminFull access including user management
managerCreate, edit, delete, run, stop DAGs; view audit logs
developerCreate, edit, delete, run, stop DAGs
operatorRun and stop DAGs only (no editing)
viewerRead-only access

API keys can be created with independent role assignments. Audit logging tracks all actions.

TLS and Secrets

  • TLS for the HTTP server (DAGU_CERT_FILE, DAGU_KEY_FILE)
  • Mutual TLS for gRPC coordinator/worker communication (DAGU_PEER_CERT_FILE, DAGU_PEER_KEY_FILE, DAGU_PEER_CLIENT_CA_FILE)
  • Secret management with three providers: environment variables, files, and HashiCorp Vault

Observability

Prometheus Metrics

Dagu exposes Prometheus-compatible metrics at the /metrics endpoint:

MetricDescription
dagu_dag_runs_totalTotal DAG runs by status
dagu_dag_runs_total_by_dagPer-DAG run counts
dagu_dag_run_duration_secondsHistogram of run durations
dagu_dag_runs_currently_runningActive DAG runs
dagu_dag_runs_queued_totalQueued runs
dagu_queue_wait_timeQueue wait time histogram
dagu_uptime_secondsServer uptime

OpenTelemetry

Per-DAG OpenTelemetry tracing configuration with OTLP endpoint, custom headers, resource attributes, and TLS options.

Structured Logging and Notifications

  • JSON or text format logging (DAGU_LOG_FORMAT), per-run log files with separate stdout/stderr capture per step
  • Slack and Telegram bot integration for run status events (succeeded, failed, aborted, waiting, rejected)
  • Email notifications on DAG success, failure, or wait status via SMTP
  • Per-DAG webhook endpoints with token authentication

Distributed Execution

The coordinator/worker architecture distributes DAG execution across multiple machines:

  • Coordinator: gRPC server managing task distribution, worker registry, and health monitoring
  • Workers: Connect to the coordinator, pull tasks via long-polling, execute DAGs locally, stream logs back
  • Worker labels: Route DAGs to specific workers based on labels (e.g., gpu=true, region=us-east-1)
  • Health checks: HTTP health endpoints on coordinator and workers for load balancer integration
  • Queue system: File-based persistent queue with configurable concurrency limits
bash
# Start coordinator
dagu coord

# Start workers (on separate machines)
DAGU_WORKER_LABELS=gpu=true,memory=64G dagu worker

See the Distributed Execution documentation for setup details.

Workflow Examples

Parallel Execution with Dependencies

yaml
type: graph
steps:
  - id: extract
    command: ./extract.sh

  - id: transform_a
    command: ./transform_a.sh
    depends: [extract]

  - id: transform_b
    command: ./transform_b.sh
    depends: [extract]

  - id: load
    command: ./load.sh
    depends: [transform_a, transform_b]

Docker Step

yaml
steps:
  - name: build
    container:
      image: node:20-alpine
    command: npm run build

Retry with Exponential Backoff

yaml
steps:
  - name: flaky-api-call
    command: curl -f https://api.example.com/data
    retryPolicy:
      limit: 3
      intervalSec: 10
      backoff: 2
      maxIntervalSec: 120
    continueOn:
      failure: true

Scheduling with Overlap Control

yaml
schedule:
  - "0 */6 * * *"
overlapPolicy: skip
timeoutSec: 3600
handlerOn:
  failure:
    command: notify-team.sh
  exit:
    command: cleanup.sh

Sub-DAG Composition

yaml
steps:
  - name: extract
    call: etl/extract
    params: "SOURCE=s3://bucket/data.csv"

  - name: transform
    call: etl/transform
    params: "INPUT=${extract.outputs.result}"
    depends: [extract]

  - name: load
    call: etl/load
    params: "DATA=${transform.outputs.result}"
    depends: [transform]

SSH Remote Execution

yaml
steps:
  - name: deploy
    type: ssh
    config:
      host: prod-server.example.com
      user: deploy
      key: ~/.ssh/id_rsa
    command: cd /var/www && git pull && systemctl restart app

See Examples for more patterns.

Version-Controlled Workflows

Dagu supports Git sync to keep DAG definitions version-controlled. Enable DAGU_GITSYNC_ENABLED=true with a repository URL, and Dagu pulls DAG definitions from a Git branch. Optional auto-sync polls the repository at a configurable interval (default 300s). Supports token and SSH authentication.

See Git Sync for configuration.

CLI Reference

CommandDescription
dagu start <dag>Execute a DAG
dagu start-allStart HTTP server + scheduler
dagu serverStart HTTP server only
dagu schedulerStart scheduler only
dagu coordStart coordinator (distributed mode)
dagu workerStart worker (distributed mode)
dagu stop <dag>Stop a running DAG
dagu restart <dag>Restart a DAG
dagu retry <dag> <run-id>Retry a failed run
dagu dry <dag>Dry run (show what would execute)
dagu status <dag>Show DAG run status
dagu history <dag>Show execution history
dagu validate <dag>Validate DAG YAML
dagu enqueue <dag>Add DAG to the execution queue
dagu dequeue <dag>Remove DAG from the queue
dagu cleanupClean up old run data
dagu migrateRun database migrations

Full CLI and environment variable reference: CLI | Configuration Reference

Learn More

Overview

Architecture and core concepts

Getting Started

Installation and first workflow

Writing Workflows

YAML syntax, scheduling, execution control

YAML Reference

All configuration options

Step Types

All 18 executor types

Distributed Execution

Coordinator/worker setup

Authentication

RBAC, OIDC, API keys, audit logging

Server Administration

Deployment, configuration, operations

Community

Released under the MIT License.