Data Flow
How data moves through your workflows - from parameters to outputs, between steps, and across workflows.
Overview
Dagu provides multiple mechanisms for passing data through workflows:
- Output Variables - Capture command output for use in later steps
- Environment Variables - Define variables accessible to all steps
- Parameters - Pass runtime values into workflows
- File-based Passing - Redirect output to files
- JSON Path References - Access nested data structures
- Step ID References - Reference step properties and files
- Sub DAG Outputs - Capture results from sub-workflows
- DAG Run Outputs - Collect string-form outputs into a structured file for viewing and API access
Output Variables
Capture command output and use it in subsequent steps:
steps:
- command: cat VERSION
output: VERSION
- command: docker build -t myapp:${VERSION} .How It Works
- Command stdout is captured (up to
max_output_sizelimit) - Stored in the variable name specified by
output - Available to all downstream steps via
${VARIABLE_NAME} - Trailing newlines are automatically trimmed
- String-form
output: NAMEvalues are collected intooutputs.jsonfor the DAG run (see DAG Run Outputs)
String vs Object Form
The output field supports both string and object forms:
steps:
# Simple string form
- id: get_version
command: cat VERSION
output: VERSION
# Object form publishes structured step-scoped output
- id: inspect_build
script: |
printf '{"version":"v1.2.3","artifact":{"url":"https://example.test/app.tgz"}}'
output:
version:
from: stdout
decode: json
select: .version
artifact:
from: stdout
decode: json
select: .artifactString form captures stdout into one flat variable and includes it in outputs.json.
Object form publishes structured step output for ${step_id.output.*} references. Each entry can be:
- a literal value
from: stdoutfrom: stderrfrom: file
with optional decode: json|yaml|text and select: .path.to.value.
Multiple Outputs
String-form output: NAME captures one flat variable per step. Use object-form output when one step should publish multiple structured values:
type: graph
steps:
- id: count_users
command: wc -l < users.txt
output: USER_COUNT
- id: count_orders
command: wc -l < orders.txt
output: ORDER_COUNT
- id: report
command: |
echo "Users: ${USER_COUNT}"
echo "Orders: ${ORDER_COUNT}"
depends:
- count_users
- count_ordersJSON Path References
Access nested values in JSON output using dot notation:
steps:
- command: |
echo '{
"database": {
"host": "localhost",
"port": 5432,
"credentials": {
"username": "app_user"
}
}
}'
output: CONFIG
- command: |
psql -h ${CONFIG.database.host} \
-p ${CONFIG.database.port} \
-U ${CONFIG.database.credentials.username}Array Access
Access array elements by index:
steps:
- command: |
echo '[
{"name": "web1", "ip": "10.0.1.1"},
{"name": "web2", "ip": "10.0.1.2"}
]'
output: SERVERS
- command: ping -c 1 ${SERVERS[0].ip}Environment Variables
DAG-Level Variables
Define variables available to all steps:
env:
- LOG_LEVEL: debug
- DATA_DIR: /var/data
- API_URL: https://api.example.com
steps:
- command: python process.py --log=${LOG_LEVEL} --data=${DATA_DIR}Variable Expansion
Reference other variables:
env:
- BASE_DIR: ${HOME}/project
- DATA_DIR: ${BASE_DIR}/data
- OUTPUT_DIR: ${BASE_DIR}/output
- CONFIG_FILE: ${DATA_DIR}/config.yamlCommand Substitution
Execute commands and use their output:
env:
- TODAY: "`date +%Y-%m-%d`"
- GIT_COMMIT: "`git rev-parse HEAD`"
- HOSTNAME: "`hostname -f`"
steps:
- command: tar -czf backup-${TODAY}-${GIT_COMMIT}.tar.gz data/Parameters
Named Parameters
Define parameters with defaults:
params:
- ENVIRONMENT: dev
- BATCH_SIZE: 100
- DRY_RUN: false
steps:
- command: |
echo "Processing data" \
--env=${ENVIRONMENT} \
--batch=${BATCH_SIZE} \
--dry-run=${DRY_RUN}Override at runtime:
dagu start workflow.yaml -- ENVIRONMENT=prod BATCH_SIZE=500Dynamic Values
Use env: for computed values. DAG-level params: are treated as literal input and do not evaluate backticks or ${VAR}.
env:
- DATE: "`date +%Y-%m-%d`"
- RUN_ID: "`uuidgen`"
- USER: "`whoami`"Step ID References
Reference step properties using the id field:
steps:
- id: risky
command: 'sh -c "if [ $((RANDOM % 2)) -eq 0 ]; then echo Success; else echo Failed && exit 1; fi"'
continue_on:
failure: true
- command: |
if [ "${risky.exit_code}" = "0" ]; then
echo "Success! Checking output..."
cat ${risky.stdout} # Read content from the file
else
echo "Failed with code ${risky.exit_code}"
echo "Error log:"
cat ${risky.stderr} # Read content from the file
fiAvailable properties:
${id.exit_code}- Exit code of the step (as a string, e.g.,"0"or"1")${id.stdout}- Path to stdout log file${id.stderr}- Path to stderr log file${id.output}- Captured string output or structured object-form payload (requiresoutput:on the referenced step)
Important:
${id.stdout}and${id.stderr}return file paths, not the actual output content. Usecat ${id.stdout}to read the content. To capture output content directly into a variable for use in subsequent steps, use theoutput:field instead.
Step Output References
When a step has an output: field, downstream steps can access it via ${id.output}:
type: graph
steps:
- id: extract_title
output: RESULT
script: |
printf 'Quarterly Revenue'
- id: extract_summary
output: RESULT
script: |
printf 'Revenue grew 18 percent year over year.'
- id: report
depends: [extract_title, extract_summary]
script: |
printf 'Title: %s\nSummary: %s' "${extract_title.output}" "${extract_summary.output}"
output: REPORTString-form ${id.output} returns the captured text value.
Object-form ${id.output} returns the full published object as compact JSON, and nested access works directly:
steps:
- id: publish
output:
version: "v1.2.3"
artifact:
url: "https://example.test/app.tgz"
- command: |
echo "Version: ${publish.output.version}"
echo "Artifact: ${publish.output.artifact.url}"${id.output} vs ${id.stdout}:
${id.output}returns the captured value or published object payload.${id.stdout}returns a file path to the stdout log. Usecat ${id.stdout}to read it.
If the referenced step does not have output: configured, ${id.output} is not expanded — it passes through as a literal string.
Substring slicing works on the output value: ${id.output:0:5} extracts the first 5 characters of the captured output.
Sub DAG Outputs
Capture outputs from nested workflows:
Basic Child Output
# parent.yaml
steps:
- call: etl-workflow
params: "DATE=${TODAY}"
output: ETL_RESULT
- command: |
echo "Status: ${ETL_RESULT.status}"
echo "Records: ${ETL_RESULT.outputs.record_count}"
echo "Duration: ${ETL_RESULT.outputs.duration}"Output Structure
Sub DAG output contains:
{
"name": "etl-workflow",
"params": "DATE=2024-01-15",
"status": "succeeded",
"outputs": {
"record_count": "1000",
"duration": "120s"
}
}Nested DAG Outputs
Access outputs from deeply nested workflows:
steps:
- call: main-pipeline
output: PIPELINE
- command: |
# Access nested outputs
echo "ETL Status: ${PIPELINE.outputs.ETL_OUTPUT.status}"
echo "ML Score: ${PIPELINE.outputs.ML_OUTPUT.outputs.accuracy}"Parallel Execution Outputs
When running parallel executions, outputs are aggregated:
steps:
- call: region-processor
parallel:
items: ["us-east", "us-west", "eu-central"]
output: RESULTS
- command: |
echo "Total regions: ${RESULTS.summary.total}"
echo "Succeeded: ${RESULTS.summary.succeeded}"
echo "Failed: ${RESULTS.summary.failed}"
# Access individual results
echo "US-East revenue: ${RESULTS.outputs[0].revenue}"
echo "US-West revenue: ${RESULTS.outputs[1].revenue}"Parallel Output Structure
{
"summary": {
"total": 3,
"succeeded": 3,
"failed": 0
},
"results": [
{
"params": "us-east",
"status": "succeeded",
"outputs": {
"revenue": "1000000"
}
}
// ... more results
],
"outputs": [
{"revenue": "1000000"},
{"revenue": "750000"},
{"revenue": "500000"}
]
}File-Based Data Passing
Output Redirection
Redirect output to files for large data:
steps:
- command: python generate_report.py
stdout: /tmp/report.txt
- command: mail -s "Report" user@example.com < /tmp/report.txtWorking with Files
steps:
- command: |
tar -xzf data.tar.gz -C /tmp/
ls /tmp/data/ > /tmp/filelist.txt
- command: |
while read file; do
process.sh "/tmp/data/$file"
done < /tmp/filelist.txtSpecial Environment Variables
Dagu automatically injects run metadata such as DAG_RUN_ID, DAG_RUN_STEP_NAME, and log file locations. See Special Environment Variables for the complete reference.
Example usage:
steps:
- command: |
echo "Backing up logs for ${DAG_NAME} run ${DAG_RUN_ID}"
cp ${DAG_RUN_LOG_FILE} /backup/Output Size Limits
Control maximum output size to prevent memory issues:
# Set 5MB limit for all steps
max_output_size: 5242880
steps:
- command: cat large-file.json
output: DATA # Fails if output > 5MB
- command: generate-huge-file.sh
stdout: /tmp/huge.txt # No size limit with file redirectionVariable Resolution Order
Variables are resolved with step-level taking highest precedence:
- Step-level environment - Overrides everything
- Output variables - From completed steps
- Secrets - From
secrets:block - DAG-level environment - Includes
env:anddotenvfiles
Example:
env:
- MESSAGE: "DAG level"
steps:
- env:
- MESSAGE: "Step level" # This wins
command: echo "${MESSAGE}"For detailed precedence rules including interpolation vs runtime environment, see Variables Reference.
DAG Run Outputs
Only string-form output: NAME values are collected into outputs.json when the DAG completes. Object-form output: {...} stays step-scoped for ${step.output.*} references and is not written to the run outputs file.
See Outputs for complete documentation on output collection, access methods, and security features.
