DuckDB
Run DuckDB SQL from a workflow with the official duckdb@v1 action.
DuckDB is packaged as an official action instead of a built-in SQL executor. This keeps the Dagu core binary portable and cgo-free while still letting workflows use a pinned DuckDB CLI through the action's own tools declaration. Dagu tools is powered internally by aqua from the aquaproj project.
Official Action
Use action: duckdb@v1. Older built-in forms such as duckdb.query and duckdb.import are not part of the core SQL step types.
Basic Usage
type: graph
steps:
- id: query
action: duckdb@v1
with:
query: |
SELECT 42 AS answer, 'duckdb' AS engine;
- id: print_result
depends: [query]
run: printf '%s\n' '${query.outputs.result}'The default output format is DuckDB JSON mode, so result is a JSON string:
[{"answer":42,"engine":"duckdb"}]Use ${query.outputs.result} for small results such as counts, IDs, status rows, or compact JSON, replacing query with your step id. For large rowsets, write to an artifact or file instead of routing the data through action outputs.
Existing DuckDB Files
Use database to run SQL against an existing DuckDB file:
type: graph
steps:
- id: query_existing_db
action: duckdb@v1
with:
database: /data/analytics.duckdb
query: |
SELECT count(*) AS users FROM users;Use workdir when the database path or files referenced by SQL should be resolved relative to a directory:
type: graph
steps:
- id: query_project_db
action: duckdb@v1
with:
workdir: /data/project
database: analytics.duckdb
query: |
SELECT * FROM read_csv_auto('events.csv') LIMIT 10;The database file must exist on the worker that runs the action. In distributed shared-nothing mode, use a shared mount, object storage, or an absolute path available on that worker. Omitting database creates a transient in-memory database for that one action invocation, so it cannot share state with later steps.
Use readonly: true when a step should only inspect an existing database:
steps:
- id: inspect
action: duckdb@v1
with:
database: /data/analytics.duckdb
readonly: true
query: |
SELECT table_name
FROM information_schema.tables
ORDER BY table_name;Multiple Operations
For tightly coupled operations, run multiple SQL statements in one action. This keeps them in one DuckDB process and lets you control the transaction boundary:
type: graph
steps:
- id: update_metrics
action: duckdb@v1
with:
database: /data/analytics.duckdb
query: |
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS metrics (
name VARCHAR,
value INTEGER
);
DELETE FROM metrics WHERE name = 'runs';
INSERT INTO metrics VALUES ('runs', 1);
COMMIT;
SELECT * FROM metrics WHERE name = 'runs';For separate DAG visibility, use multiple action steps against the same database file and connect them with depends:
type: graph
steps:
- id: insert_rows
action: duckdb@v1
with:
database: /data/analytics.duckdb
query: |
CREATE TABLE IF NOT EXISTS metrics (
name VARCHAR,
value INTEGER
);
DELETE FROM metrics WHERE name = 'jobs';
INSERT INTO metrics VALUES ('jobs', 10);
- id: update_rows
depends: [insert_rows]
action: duckdb@v1
with:
database: /data/analytics.duckdb
query: |
UPDATE metrics SET value = value + 5 WHERE name = 'jobs';
- id: select_rows
depends: [update_rows]
action: duckdb@v1
with:
database: /data/analytics.duckdb
readonly: true
query: |
SELECT * FROM metrics WHERE name = 'jobs';
- id: print_result
depends: [select_rows]
run: printf '%s\n' '${select_rows.outputs.result}'Keep write operations ordered with depends. Parallel writes to the same DuckDB file can conflict because DuckDB uses file-level locking semantics.
Import Data
Use DuckDB SQL functions such as read_csv_auto, read_json_auto, and read_parquet to load files.
Create or replace a table from CSV:
steps:
- id: import_orders
action: duckdb@v1
with:
database: /data/analytics.duckdb
query: |
CREATE OR REPLACE TABLE orders AS
SELECT *
FROM read_csv_auto('/data/orders.csv');Append rows from JSONL:
steps:
- id: append_events
action: duckdb@v1
with:
database: /data/analytics.duckdb
query: |
INSERT INTO events
SELECT *
FROM read_json_auto('/data/events.jsonl');Load one DuckDB database from another by using ATTACH:
steps:
- id: copy_between_databases
action: duckdb@v1
with:
database: /data/target.duckdb
query: |
ATTACH '/data/source.duckdb' AS source_db;
INSERT INTO target_table
SELECT *
FROM source_db.source_table;Export Data
For large or typed datasets, write files directly from SQL. When the target path is under ${DAG_RUN_ARTIFACTS_DIR}, Dagu stores it as a run artifact.
steps:
- id: export_parquet
action: duckdb@v1
with:
database: /data/analytics.duckdb
query: |
COPY (
SELECT id, name, score
FROM source_table
WHERE score >= 80
)
TO '${DAG_RUN_ARTIFACTS_DIR}/exports/selected_rows.parquet'
(FORMAT parquet);You can load that file in a later step when the artifact directory is readable by the worker running the step:
steps:
- id: export_parquet
action: duckdb@v1
with:
database: /data/source.duckdb
query: |
COPY (
SELECT id, name, score
FROM source_table
WHERE score >= 80
)
TO '${DAG_RUN_ARTIFACTS_DIR}/exports/selected_rows.parquet'
(FORMAT parquet);
- id: insert_parquet
depends: [export_parquet]
action: duckdb@v1
with:
database: /data/target.duckdb
query: |
INSERT INTO target_table
SELECT *
FROM read_parquet('${DAG_RUN_ARTIFACTS_DIR}/exports/selected_rows.parquet');In distributed shared-nothing mode, an artifact path may be worker-local while the run is still executing. For cross-worker data handoff, use a shared mounted path, object storage, or keep the transfer inside one DuckDB statement with ATTACH and INSERT INTO ... SELECT.
Stdout Artifacts
If the query result should be stored as a file instead of action output, call the pinned DuckDB CLI directly and attach stdout to an artifact:
type: graph
tools:
- duckdb/duckdb@v1.5.2
steps:
- id: export_rows
run: |
duckdb -batch -bail -no-stdin -csv /data/source.duckdb \
-c "SELECT id, name, score FROM source_table WHERE score >= 80"
stdout:
artifact: exports/selected_rows.csvThis keeps the CSV out of Dagu output variables while making it available in the run's Artifacts tab.
Inputs
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | - | SQL passed to duckdb -c. |
database | string | No | transient in-memory database | Database file path. Use an absolute path when the file lives outside the action workspace. |
workdir | string | No | action workspace | Directory to cd into before running DuckDB. Use this when SQL references local files with relative paths. |
format | string | No | json | Output format: json, csv, table, markdown, line, list, or column. |
readonly | boolean | No | false | Open the database in read-only mode. |
Outputs
| Name | Type | Description |
|---|---|---|
result | string | Raw DuckDB stdout in the selected format. |
Local Development
Use source: to call a local checkout of the action:
steps:
- id: query
action: source:file:///path/to/duckdb@local
with:
query: SELECT 1 AS ok;Remote actions run in their own action workspace. If a query needs files from a caller workspace, pass workdir and use paths that exist on the worker running the action.
See Also
- Official Actions - Maintained
dagucloud/*action packages - ETL Overview - Built-in PostgreSQL and SQLite step types
- Artifacts - Persisting run-scoped output files
- Tools - Pinned CLI tools for command steps
