CLI And Runner API#

Use MoDaCor either from the command line (modacor run) or from Python via a shared execution helper (run_pipeline_job). Both paths use the same pipeline scheduler backend.

Prerequisites#

  • Install MoDaCor in an environment with Python 3.12+.

  • If using a source checkout, install in editable mode:

pip install -e .

Command-line interface#

The package now installs a modacor command with a run subcommand.

Minimal run#

modacor run --pipeline processing_pipelines/MOUSE_solids.yaml

This is useful for pipelines that already load data sources/sinks internally (for example using AppendSource and AppendSink steps).

Run with external source registration#

For pipelines that expect sources to be provided externally:

modacor run \
  --pipeline processing_pipelines/MOUSE_solids.yaml \
  --hdf-source sample=/data/MOUSE_sample.nxs \
  --hdf-source background=/data/MOUSE_background.nxs \
  --yaml-source defaults=/data/defaults.yaml

Supported source/sink registration flags:

  • --hdf-source REF=PATH (repeatable)

  • --yaml-source REF=PATH (repeatable)

  • --csv-sink REF=PATH (repeatable)

Tracing and step control#

modacor run \
  --pipeline processing_pipelines/MOUSE_solids.yaml \
  --trace \
  --trace-watch sample:signal,Q \
  --trace-watch background:signal \
  --trace-report-lines 50 \
  --stop-after GX
  • --trace enables trace capture and event attachment.

  • --trace-watch is repeatable and uses bundle:key[,key...].

  • --stop-after STEP_ID stops the run after that step is executed.

Export selected results to HDF5#

Use repeatable --write-path selectors:

modacor run \
  --pipeline processing_pipelines/MOUSE_solids.yaml \
  --write-hdf output/results.h5 \
  --run-name run1 \
  --write-path /sample/signal/signal \
  --write-path /sample/Q/signal \
  --write-path /background/signal/signal

If you want to store the entire current ProcessingData (all BaseData entries) without listing paths:

modacor run \
  --pipeline processing_pipelines/MOUSE_solids.yaml \
  --write-hdf output/results_full.h5 \
  --write-all-processing-data

Semantics:

  • --write-hdf sets the output file.

  • each --write-path adds one ProcessingData path to data_paths.

  • --write-all-processing-data auto-selects all BaseData entries from ProcessingData.

  • --run-name maps to the HDF sink run subpath (default: default).

  • the HDF output stores reproducibility metadata under processing/pipeline/<run-name>/: spec (JSON) and yaml (pipeline YAML).

  • trace output is stored under processing/tracer/<run-name>/ as raw events JSON and indexed steps/ + index/.

Shared Python runner API#

Use this when driving MoDaCor in notebooks or scripts while keeping behavior consistent with the CLI:

from pathlib import Path

from modacor.io.hdf.hdf_source import HDFSource
from modacor.io.io_sources import IoSources
from modacor.runner import run_pipeline_job

sources = IoSources()
sources.register_source(
    HDFSource(source_reference="sample", resource_location=Path("/data/sample.nxs"))
)

result = run_pipeline_job(
    Path("processing_pipelines/MOUSE_solids.yaml"),
    sources=sources,
    trace=True,
    trace_watch={"sample": ["signal"]},
)

print(result.executed_steps)
print(result.processing_data.keys())

run_pipeline_job(...) returns a RunResult container with:

  • processing_data

  • pipeline

  • tracer (or None)

  • step_durations

  • executed_steps

  • stopped_after_step