CLI And Runner API#

Use MoDaCor either from the command line (modacor run) or from Python via a shared execution helper (run_pipeline_job). Both paths use the same pipeline scheduler backend.

Prerequisites#

Install MoDaCor in an environment with Python 3.12+.
If using a source checkout, install in editable mode:

pip install -e .

Command-line interface#

The package now installs a modacor command with a run subcommand.

Minimal run#

modacor run --pipeline processing_pipelines/MOUSE_solids.yaml

This is useful for pipelines that already load data sources/sinks internally (for example using AppendSource and AppendSink steps).

Run with external source registration#

For pipelines that expect sources to be provided externally:

modacor run \
  --pipeline processing_pipelines/MOUSE_solids.yaml \
  --hdf-source sample=/data/MOUSE_sample.nxs \
  --hdf-source background=/data/MOUSE_background.nxs \
  --yaml-source defaults=/data/defaults.yaml

Supported source/sink registration flags:

--hdf-source REF=PATH (repeatable)
--yaml-source REF=PATH (repeatable)
--csv-sink REF=PATH (repeatable)

The session API wrapper exposes the same idea for long-lived runtime sessions:

modacor session set-source \
  --session-id mouse-main \
  --ref sample \
  --type hdf \
  --location /data/MOUSE_sample.nxs

modacor session set-sink \
  --session-id mouse-main \
  --ref export_csv \
  --type csv \
  --location /data/out/current.csv \
  --kwargs-json '{"delimiter": ","}'

Use modacor session delete-sink --session-id mouse-main --ref export_csv to remove a registered runtime sink.

Tracing and step control#

modacor run \
  --pipeline processing_pipelines/MOUSE_solids.yaml \
  --trace \
  --trace-watch sample:signal,Q \
  --trace-watch background:signal \
  --trace-report-lines 50 \
  --stop-after GX

--trace enables trace capture and event attachment.
--trace-watch is repeatable and uses bundle:key[,key...].
--stop-after STEP_ID stops the run after that step is executed.

Export selected results to HDF5#

Use repeatable --write-path selectors:

modacor run \
  --pipeline processing_pipelines/MOUSE_solids.yaml \
  --write-hdf output/results.h5 \
  --run-name run1 \
  --write-path /sample/signal/signal \
  --write-path /sample/Q/signal \
  --write-path /background/signal/signal

If you want to store the entire current ProcessingData (all BaseData entries) without listing paths:

modacor run \
  --pipeline processing_pipelines/MOUSE_solids.yaml \
  --write-hdf output/results_full.h5 \
  --write-all-processing-data

Semantics:

--write-hdf sets the output file.
each --write-path adds one ProcessingData path to data_paths.
--write-all-processing-data auto-selects all BaseData entries from ProcessingData.
--run-name maps to the HDF sink run subpath (default: default).
the HDF output stores reproducibility metadata under processing/pipeline/<run-name>/: spec (JSON) and yaml (pipeline YAML).
trace output is stored under processing/tracer/<run-name>/ as raw events JSON and indexed steps/ + index/.

Shared Python runner API#

Use this when driving MoDaCor in notebooks or scripts while keeping behavior consistent with the CLI:

from pathlib import Path

from modacor.io.hdf.hdf_source import HDFSource
from modacor.io.io_sources import IoSources
from modacor.runner import run_pipeline_job

sources = IoSources()
sources.register_source(
    HDFSource(source_reference="sample", resource_location=Path("/data/sample.nxs"))
)

result = run_pipeline_job(
    Path("processing_pipelines/MOUSE_solids.yaml"),
    sources=sources,
    trace=True,
    trace_watch={"sample": ["signal"]},
)

print(result.executed_steps)
print(result.processing_data.keys())

run_pipeline_job(...) returns a RunResult container with:

processing_data
pipeline
tracer (or None)
step_durations
executed_steps
stopped_after_step

Table of Contents