
Lightweight and Powerful Workflow Engine
Dagu is a lightweight workflow engine that comes with a Web UI. Define workflows in a simple, declarative YAML format and run existing scripts or tools without modification. It natively supports shell commands, Docker containers, Kubernetes Jobs, SSH commands, and more.
Start with one self-contained binary and file-backed state. No DBMS or message broker is required, and you can add queues, workers, MCP, or AI-agent steps only when your workflows need them.
Motivation
In complex systems often have implicit dependencies between jobs. When there are hundreds of cron jobs on a server, it can be difficult to keep track of these dependencies and to determine which job to rerun if one fails. It can also be a hassle to SSH into a server to view logs and manually rerun shell scripts one by one. Dagu aims to solve these problems by allowing you to explicitly visualize and manage pipeline dependencies as a DAG, and by providing a web UI for checking dependencies, execution status, and logs and for rerunning or stopping jobs with a simple mouse click.
There are many existing tools such as Airflow, Prefect, and Temporal, but many of these require you to write code in a programming language like Python to define your DAG. For many systems, there may already be complex jobs with hundreds of thousands of lines of code. Adding another layer of complexity on top of these codes can reduce maintainability. Dagu was designed to be easy to use, self-contained, and require no coding, making it ideal for small teams.
How a Workflow Runs
Dagu does not make you rewrite the work. Your scripts, SQL files, containers, SSH commands, APIs, and services can stay as they are; the YAML adds inputs, order, logs, retries, approvals, artifacts, and recovery controls around them.
params:
- name: customer_id
type: string
description: Customer or account identifier
- name: change_scope
type: string
description: What the repair is allowed to change
enum:
- metadata_only
- permissions
- full_account
default: metadata_only
- name: dry_run
type: boolean
default: true
steps:
- id: inspect_account
run: ./scripts/inspect-account.sh --customer "${customer_id}"
stdout:
artifact: reports/inspection.md
- id: review
action: noop
depends: inspect_account
approval:
prompt: Review the inspection report before running the repair.
- id: repair_account
run: >-
./scripts/repair-account.sh
--customer "${customer_id}"
--scope "${change_scope}"
--dry-run="${dry_run}"
depends: review
stdout:
artifact: reports/repair.logIn this example, the DAG turns an existing account-repair runbook into a reviewed workflow. The params block gives Dagu enough information to render a guided input form before the run starts. The inspection output is stored as an artifact, the repair waits for explicit approval, and the submitted values, logs, artifacts, and status stay attached to the run history.
During a run, Dagu resolves dependencies, starts ready steps, captures stdout and stderr, tracks status, applies retry rules, pauses for approvals, stores artifacts, and updates the Web UI in real time.
Core Terminology
Understanding Dagu is easier once the main terms are clear.
| Term | Meaning |
|---|---|
| DAG | A workflow file written in YAML. Steps run according to dependencies, so the execution order is explicit. |
| Step | One unit of work. A step can run a command, container, SSH command, HTTP request, SQL query, readiness wait, sub-workflow, or AI agent task. |
| Action | The kind of work a step runs, such as run, docker.run, kubernetes.run, ssh.run, http.request, postgres.query, wait.http, s3.upload, or agent.run. You can also define custom actions, call third-party actions, or use official actions such as duckdb@v1. |
| Dagu Action | A versioned action package such as python-script@v1, duckdb@v1, or ffmpeg@v1. |
| Parameter | A declared run input with a name, type, default, description, or allowed values. Parameters power the generated Web UI start form and keep submitted values visible with the run. |
| Tool | A pinned CLI package declared with tools. Dagu installs these before the run so host command steps use the expected binary version. |
| State | A small JSON value stored across DAG runs with state.* actions, useful for cursors, checkpoints, previous snapshots, and change detection. |
| Run | One execution of a DAG. Runs keep status, logs, timing, outputs, and artifacts. |
| Notification | A UI-managed route that sends run events to Slack, email, Telegram, Google Chat, or webhooks. |
| Incident | A provider-backed failure lifecycle that opens on final failure, deduplicates repeated failures, and resolves after recovery. |
| Schedule | Cron-based automation for starting DAG runs, including timezone support. |
| Queue | Concurrency control for workflows, useful when jobs must not overlap or when workers are shared. |
| Worker | A machine that executes tasks in distributed mode. Workers can be selected by labels such as region, GPU, or environment. |
| Artifact | A file produced by a run and stored with the run history for preview, download, or audit. |
See Core Concepts for the deeper model.
Why Teams Choose Dagu
Teams choose Dagu when they want workflow orchestration without adopting a large platform. Start with one binary, describe work in YAML, keep existing scripts and tools intact, and add operational controls only where they help.
Self-contained
Install one executable. The default quickstart setup includes the scheduler, workflow runtime, and Web UI without requiring an external DBMS or message broker.
Declarative YAML
Define workflows in a simple, declarative YAML format. Dependencies, parameters, schedules, retries, and execution controls are visible in one file and easy to review in Git.
Persistent workflow state
Store small JSON state across runs for cursors, checkpoints, previous snapshots, and change detection without adding an external database.
CLI-oriented
Use existing scripts and commands, Python jobs, SQL, dbt, DuckDB, containers, SSH operations, HTTP calls, and other tools without modification.
Built-in Web UI
Inspect runs, read logs, retry failed steps, start workflows, and review history from the Web UI instead of SSHing into servers for routine operation.
Native executors
Run shell commands, Docker containers, Kubernetes Jobs, SSH commands, HTTP requests, SQL queries, sub-workflows, and agent steps.
File-backed state
Run history, logs, and artifacts are stored as files by default, keeping local and self-hosted deployments simple.
Scales gradually
Start on one machine, then move heavy, regional, or specialized jobs to distributed workers with label-based routing.
AI and MCP ready
Use MCP-capable agents to inspect state, preview workflow changes, and operate runs, or call agent CLIs from workflow steps when automation needs AI assistance.
Architecture at a Glance
Dagu can run in a small local setup or scale out when workloads grow. The operating model changes, but the workflow YAML does not need to be rewritten.
Standalone
dagu start-all runs the Web UI, scheduler, and workflow runtime in one process.
Best for one server, a team utility box, a private automation host, or getting started quickly.
Headless
Run workflows from the CLI or API without relying on the Web UI.
Best for CI-like automation, locked-down servers, or environments where Dagu is managed by another system.
Coordinator and Workers
The scheduler queues work, the coordinator assigns tasks, and workers execute DAGs over gRPC.
Best for many machines, GPU jobs, regional routing, mixed workloads, and high-throughput batch processing.
See Architecture for internals and storage, and Deployment Models for local, self-hosted, managed, and hybrid deployment options.
How Dagu Is Different
| Existing problem | Dagu path |
|---|---|
| Operational tasks are scattered across scripts, SQL files, SSH commands, API calls, cron entries, and engineer runbooks | One YAML workflow with parameters, dependencies, approvals, retries, logs, artifacts, and run controls. |
| A custom admin UI is needed just to let support or operations teams run a safe command | Declare parameters in YAML and let Dagu generate the Web UI input form, validation, logs, and run history. |
| A cloud job platform would move execution away from private data, credentials, and internal networks | Run workflows where the data, credentials, files, and existing CLIs already live. |
| A large orchestrator is too much infrastructure for scripts and runbooks | Start with one binary and file-backed state, then add queues and distributed workers only when needed. |
| Important runbooks still require manual SSH sessions and tribal knowledge | Reviewed workflows give operators safe execution while engineers keep commands, logs, outputs, and approvals traceable. |
Real-World Use Cases
Dagu is useful anywhere existing scripts, containers, SQL jobs, operational tasks, or agent-driven jobs need parameters, approvals, scheduling, retries, visibility, and a safe way for a team to run them.
ETL and Data Operations
Run: PostgreSQL and SQLite queries, DuckDB through the official action, dbt commands, S3 transfers, pinned jq or yq tools, readiness waits, validation steps, and sub-workflows.
Why Dagu fits: daily data workflows stay declarative, run close to private data, remain easy to inspect in the Web UI, and are straightforward to retry when one step fails.
Cron and Legacy Script Management
Run: existing shell scripts, Python scripts, HTTP calls, and scheduled jobs without rewriting them.
Why Dagu fits: dependencies, logs, retries, and run history become visible in the Web UI instead of being hidden across crontabs and server log files.
Media Conversion
Run: shell-driven media tools like ffmpeg, thumbnail extraction, audio normalization, image processing, and other compute-heavy jobs.
Why Dagu fits: conversion work can run across distributed workers while run history, logs, and artifacts stay visible in one place for monitoring, debugging, and retries.
Infrastructure and Server Automation
Run: SSH backups, cleanup jobs, deploy scripts, patch windows, precondition checks, and lifecycle hooks.
Why Dagu fits: remote operations get schedules, retries, notifications, incident routing, and per-step logs without requiring operators to SSH into servers for every recovery.
GitHub-driven Workflows
Run: PR validation, preview deployments, release workflows, check reruns, workflow_dispatch, and repository_dispatch from GitHub.
Why Dagu fits: GitHub Integration keeps GitHub as the trigger source while Dagu executes the DAG on your licensed server and reports checks, reactions, and comments back to GitHub.
Container and Kubernetes Workflows
Run: Docker images, Kubernetes Jobs, shell glue, and follow-up validation steps.
Why Dagu fits: teams can compose image-based tasks and route them to the right workers with worker labels instead of building a custom control plane.
Customer Support Automation
Run: diagnostics, account repair jobs, data checks, and approval-gated support actions.
Why Dagu fits: non-engineers can run reviewed workflows from the Web UI while engineers keep logs and results traceable.
IoT and Edge Workflows
Run: sensor polling, local cleanup, offline sync, health checks, and device maintenance jobs.
Why Dagu fits: the single binary works well on small devices while still providing visibility through the Web UI.
AI Agent Workflows
Run: AI agent steps, agent-authored YAML workflows, log analysis, repair steps, and human-reviewed automation.
Why Dagu fits: workflows stay in plain YAML, so agents can create and debug them while humans keep logs, approvals, and run history in one place.
TIP
If it can run from a shell command, Docker image, Kubernetes Job, SSH session, HTTP call, readiness wait, or SQL query, Dagu can usually orchestrate it without rewriting the underlying tool. For portable host CLIs, add tools so the DAG controls the binary version too.
AI Agents and Workflow Operator
Dagu includes AI features, but they build on the same self-contained workflow engine. The built-in MCP server lets MCP-capable agents read Dagu state, preview or apply workflow changes, and start, enqueue, retry, or stop runs. Agent steps and external agent CLIs can also run inside workflows, with the same scheduling, logs, retries, approvals, and run history as any other step.
steps:
- id: analyze_logs
action: agent.run
with:
task: |
Analyze /var/log/app/errors.log from the last hour.
Summarize likely causes and suggest a safe recovery plan.
output: ANALYSIS_RESULTWorkflow Operator connects Slack, Telegram, Discord, or LINE to the built-in steward, so teams can ask for run status, debug failures, re-run workflows, and approve actions from chat.
- MCP Server explains how agents can inspect state and operate workflows through Dagu.
- AI Agent Authoring explains workflow generation and debugging with coding agents.
- Agent Step explains how to run agent tasks inside DAGs.
- Workflow Operator explains chat-operator setup.
