Quickstart#

Run a three-step MoDaCor pipeline against the bundled MOUSE example dataset, see how data sources plug into a configuration file, and inspect the pipeline trace that records what changed at every step.

Prerequisites#

Python 3.11 or newer
pip, curl (or wget) and a POSIX-like shell
Approximately 1.3 GB of free disk space for the sample NeXus file

If you are working from the cloned MoDaCor repository, activate the project virtual environment instead of creating a new one and use pip install -e . to install the package in editable mode.

Step 1 – Prepare a working folder#

Create a clean folder, bootstrap a virtual environment, and install MoDaCor:

mkdir modacor-quickstart
cd modacor-quickstart
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install modacor

Step 2 – Download example data and metadata#

Grab the MOUSE sample dataset and create a small metadata file describing the detector dark current:

curl -LO https://github.com/BAMresearch/modacor/raw/main/tests/testdata/MOUSE_20250324_1_160_stacked.nxs

cat <<'YAML' > mouse_metadata.yaml
---
detector:
  darkcurrent:
    value: 1.0e-5
    units: counts/second
    uncertainty: 1.0e-6
YAML

The NeXus file exposes counts in entry1/instrument/detector00/data and the exposure time in entry1/instrument/detector00/frame_exposure_time. The metadata file supplies a scalar dark-current estimate so the last pipeline step can remove it.

Step 3 – Create the pipeline configuration#

Save the following pipeline definition as mouse_quickstart.yaml:

name: mouse_quickstart
steps:
  1:
    name: add_poisson_uncertainties
    module: PoissonUncertainties
    requires_steps: []
    configuration:
      with_processing_keys:
        - sample
  2:
    name: normalize_by_exposure
    module: Divide
    requires_steps: [1]
    configuration:
      with_processing_keys:
        - sample
      divisor_source: sample::entry1/instrument/detector00/frame_exposure_time
      divisor_units_source: sample::entry1/instrument/detector00/frame_exposure_time@units
  3:
    name: subtract_darkcurrent
    module: Subtract
    requires_steps: [2]
    configuration:
      with_processing_keys:
        - sample
      subtrahend_source: metadata::detector/darkcurrent/value
      subtrahend_units_source: metadata::detector/darkcurrent/units
      subtrahend_uncertainties_sources:
        propagate_to_all: metadata::detector/darkcurrent/uncertainty

Step 4 – Create a runner script#

Place the script below in run_mouse_pipeline.py. It registers the data sources, prepares a ProcessingData object, runs the pipeline, and prints both numeric results and a compact pipeline trace.

from __future__ import annotations

from pathlib import Path
from time import perf_counter

from modacor import ureg
from modacor.dataclasses.basedata import BaseData
from modacor.dataclasses.databundle import DataBundle
from modacor.dataclasses.processing_data import ProcessingData
from modacor.debug.pipeline_tracer import PipelineTracer, PlainUnicodeRenderer
from modacor.io.hdf.hdf_source import HDFSource
from modacor.io.io_sources import IoSources
from modacor.io.yaml.yaml_source import YAMLSource
from modacor.runner.pipeline import Pipeline


def _decode_unit(unit_value) -> str:
    if isinstance(unit_value, bytes):
        return unit_value.decode()
    return str(unit_value)


def build_processing_data(sources: IoSources) -> ProcessingData:
    processing = ProcessingData()
    processing["sample"] = DataBundle()

    signal = sources.get_data("sample::entry1/instrument/detector00/data")
    signal_unit = _decode_unit(
        sources.get_data_attributes("sample::entry1/instrument/detector00/data").get("units", "counts")
    )

    processing["sample"]["signal"] = BaseData(
        signal=signal,
        units=ureg.Unit(signal_unit),
        rank_of_data=2,  # last two dimensions carry detector pixels
    )
    return processing


def main() -> None:
    pipeline = Pipeline.from_yaml_file(Path("mouse_quickstart.yaml"))

    sources = IoSources()
    sources.register_source(
        YAMLSource(source_reference="metadata", resource_location=Path("mouse_metadata.yaml"))
    )
    sources.register_source(
        HDFSource(source_reference="sample", resource_location=Path("MOUSE_20250324_1_160_stacked.nxs"))
    )

    processing_data = build_processing_data(sources)
    tracer = PipelineTracer(watch={"sample": ["signal"]})

    pipeline.prepare()
    while pipeline.is_active():
        for node in pipeline.get_ready():
            node.processing_data = processing_data
            node.io_sources = sources

            start = perf_counter()
            node.execute(processing_data)
            tracer.after_step(node, processing_data, duration_s=perf_counter() - start)

            pipeline.done(node)

    sample_signal = processing_data["sample"]["signal"]
    mean_intensity = float(sample_signal.signal.mean())
    print(f"Mean intensity after corrections: {mean_intensity:.6g} {sample_signal.units}")

    print("\nPipeline trace (last few events):\n")
    print(tracer.last_report(renderer=PlainUnicodeRenderer()))

    print("\nMermaid flowchart definition:\n")
    print(pipeline.to_mermaid())


if __name__ == "__main__":
    main()

Step 5 – Run the pipeline#

Execute the script:

python run_mouse_pipeline.py

You should see the corrected mean intensity, a compact trace summarising what changed in each step (unit conversions, shape, NaN counts, etc.), and a Mermaid flowchart definition that can be pasted into https://mermaid.live for a visual graph.

Step 6 – Where to go next#

Swap out mouse_metadata.yaml for the metadata produced by your instrument and adjust with_processing_keys for additional DataBundle entries (for example background or calibration).
Add pipeline.attach_tracer_event(node, tracer, include_rendered_trace=True) inside the execution loop if you want to export the trace alongside the configuration.
Explore the Pipeline operations and Extending MoDaCor sections for branching workflows, module development, and integration best practices.

Table of Contents

This Page