Getting Started with bam-masterdata
¶
This tutorial will guide you through your first interaction with the bam-masterdata
package, helping you understand its core concepts and basic functionality.
What is bam-masterdata
?¶
The bam-masterdata
is a Python package designed to help administrators and users to manage the Masterdata/schema definitions. It provides with a set of Python classes and utilities for working with different types of entities in the openBIS Research Data Management (RDM) system. It also contains the Masterdata definitions used at the Bundesanstalt für Materialforschung und -prüfung (BAM) in the context of Materials Science and Engineering research.
[Image placeholder: Architecture overview diagram showing the relationship between BAM Masterdata, openBIS, and the BAM Data Store. The diagram should illustrate data flow and the role of masterdata schemas in the system.]
The bam-masterdata
provides you with tools to:
- Export the Masterdata from your openBIS instance.
- Update the Masterdata in your openBIS instance.
- Export/Import from different formats: Excel, Python, RDF/XML, JSON.
- Check consistency of the Masterdata with respect to a ground truth.
- Automatically parse metainformation in your openBIS instance.
Prerequisites
- Basic Python and openBIS knowledge.
- A system with Python 3.10 or higher.
- Knowledge of virtual environments, CLI usage, IDEs such as VSCode, and GitHub.
Warning
Note all steps in this documentation are done in Ubuntu 22.04. All the commands in the terminal need to be modified if you work from Windows.
Installation and Setup¶
Create an empty test directory¶
We will test the basic functionalities of bam-masterdata
in an empty directory. Open your terminal and type:
Create a Virtual Environment¶
We strongly recommend using a virtual environment to avoid conflicts with other packages.
Using venv:
Using conda:
Install the Package¶
bam-masterdata
is part of the PyPI registry and can be installed via pip:
Faster Installation
For faster installation, you can use uv
:
Verify Installation¶
You can verify that the installation was successful. Open a Python script and write:
from importlib.metadata import version
print(f"BAM Masterdata version: {version("bam_masterdata")}")
And running in your terminal:
This should return the version of the installed package.
Your First bam-masterdata
Experience¶
Understanding Entity Types¶
The BAM Masterdata system organizes information into different entity types:
- Object Types: Physical or conceptual objects (samples, instruments, people)
- Collection Types: Groups of related objects
- Dataset Types: Data files and their metadata
- Vocabulary Types: Controlled vocabularies for standardized values
[Image placeholder: Entity relationship diagram showing the four main entity types and their relationships. Should include sample instances of each type.]
Deprecating Collection Types and Dataset Types
As of September 2025, the development of new Collection and Dataset types is stalled. We will use the abstract concepts only, i.e., a Collection Type is a class used to add objects to it and their relationships, and a Dataset Type is a class to attach raw data files to it.
Overview of the Object Types¶
The central ingredients for defining data models associated with a research activity are the Object Types. These are classes inheriting from an abstract class called ObjectType
and with two types of attributes:
defs
: The definitions of the Object Type. These attributes do not change when filling with data the object.properties
: The list of properties assigned to an object. These attributes are filled when assigning data to the object.
All accessible object types are defined as Python classes in bam_masterdata/datamodel/object_types.py
.
Each object type has a set of assigned properties (metadata fields), some of which are mandatory and some are optional. For example:
class Chemical(ObjectType):
defs = ObjectTypeDef(
code="CHEMICAL",
description="""Chemical Substance//Chemische Substanz""",
generated_code_prefix="CHEM",
)
name = PropertyTypeAssignment(
code="$NAME",
data_type="VARCHAR",
property_label="Name",
description="""Name""",
mandatory=True,
show_in_edit_views=False,
section="General Information",
)
alias = PropertyTypeAssignment(
code="ALIAS",
data_type="VARCHAR",
property_label="Alternative Name",
description="""e.g. abbreviation or nickname//z.B. Abkürzung oder Spitzname""",
mandatory=False,
show_in_edit_views=False,
section="General Information",
)
# ... more PropertyTypeAssignment
You can read more in Schema Definitions to learn about the definitions of Object Types and how to assign properties.
Creating Your First Entity¶
Let's create/instantiate a simple experimental step object:
from bam_masterdata.datamodel.object_types import ExperimentalStep
# Create a new experimental step instance
step = ExperimentalStep(
name="SEM measurement",
finished_flag=True,
)
print(step) # prints the object type and its assigned properties
This will print:
You can assign values to other properties after instantiation as well:
This will print:SEM measurement:ExperimentalStep(name="SEM measurement", show_in_project_overview=True, finished_flag=True)
If the type of the property does not match the expected type, an error will be shown. For example, ExperimentalStep.show_in_project_overview
is a boolean, hence:
Available properties for an Object Type¶
To explore which attributes are available for a given type, check its _property_metadata
.
from bam_masterdata.datamodel.object_types import ExperimentalStep
step = ExperimentalStep()
print(list(step._property_metadata.keys()))
If you want a detailed list of the PropertyTypeAssignment
assigned to an Object Type, you can print properties
instead.
Data types¶
The data types for each assigned property are defined according to openBIS. These have their direct counterpart in Python types. The following table shows the equivalency of each type:
DataType | Python type | Example assignment |
---|---|---|
BOOLEAN |
bool |
myobj.flag = True |
CONTROLLEDVOCABULARY |
str (enum term code) |
myobj.status = "ACTIVE" (must match allowed vocabulary term) |
DATE |
datetime.date |
myobj.start_date = datetime.date(2025, 9, 29) |
HYPERLINK |
str |
myobj.url = "https://example.com" |
INTEGER |
int |
myobj.count = 42 |
MULTILINE_VARCHAR |
str |
myobj.notes = "Line 1\nLine 2\nLine 3" |
OBJECT |
(openBIS object reference) | myobj.parent = another_object_instance (depends on schema) |
REAL |
float |
myobj.temperature = 21.7 |
TIMESTAMP |
datetime.datetime |
myobj.created_at = datetime.datetime.now() |
VARCHAR |
str |
myobj.name = "Test sample" |
XML |
str (XML string) |
myobj.config = "<root><tag>value</tag></root>" |
Assigning controlled vocabularies¶
Many object types have fields that only accept certain values (controlled vocabularies). Use the value codes found in bam_masterdata/datamodel/vocabulary_types.py or check the class directly:
from bam_masterdata.datamodel.vocabulary_types import StorageValidationLevel
print([term.code for term in StorageValidationLevel().terms])
Thus we can assign only:
from bam_masterdata.datamodel.object_types import Storage
store = Storage()
store.storage_storage_validation_level = "BOX" # CONTROLLEDVOCABULARY
Tip
When assigning values to properties assigned to Object Types, we recommend carefully handling potential errors. This will allow your scripts to work without interruption and with a total control of conflictive lines.
Saving your Object Types instances in a collection¶
Most usecases end with saving the Object Types and their field values in a colletion for further use.
This can be done by adding those Object Types in a CollectionType
like:
from bam_masterdata.metadata.entities import CollectionType
from bam_masterdata.datamodel.object_types import ExperimentalStep
step_1 = ExperimentalStep(name="Step 1")
collection = CollectionType()
step_1_id = collection.add(step_1)
print(collection)
This will return the CollectionType
with the attached objects:
You can also add relationships between objects by using their ids when attached to the CollectionType
:
from bam_masterdata.metadata.entities import CollectionType
from bam_masterdata.datamodel.object_types import ExperimentalStep
step_1 = ExperimentalStep(name="Step 1")
step_2 = ExperimentalStep(name="Step 2")
collection = CollectionType()
step_1_id = collection.add(step_1)
step_2_id = collection.add(step_2)
_ = collection.add_relationship(parent_id=step_1_id, child_id=step_2_id)
print(collection)
CollectionType(attached_objects={'EXP3e6f674e': ExperimentalStep(name='Step 1'), 'EXP87b64b62': ExperimentalStep(name='Step 2')}, relationships={'EXP3e6f674e>>EXP87b64b62': ('EXP3e6f674e', 'EXP87b64b62')})
Converting Object Types¶
The package supports various export formats for working with Object Types. These divide in two main purposes:
- Exporting the schema definitions: this is done using the methods
model_to_<format>()
. - Exporting the data model: this is done using the methods
to_<format()>
.
For example:
# Convert data model to dictionary
step_dict = step.to_dict()
# Convert schema to dictionary
step_schema_dict = step.model_to_dict()
print(step_dict) # print: {'name': 'SEM measurement', 'finished_flag': True}
print(step_schema_dict) # print: {'properties': [{...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}], 'defs': {'code': 'EXPERIMENTAL_STEP', 'description': 'Experimental Step (generic)//Experimenteller Schritt (allgemein)', 'iri': None, 'id': 'ExperimentalStep', 'row_location': None, 'validation_script': None, 'generated_code_prefix': 'EXP', 'auto_generate_codes': True}}
# Convert to JSON
step_json = step.to_json()
step_schema_json = step.model_to_json()
The possible export formats can be found in the API Reference documentation page.
Working with real raw data¶
In order to work with raw data, you will need to create a script (also called parser) to read the i/o files metainformation, and map such information to the corresponding Object Types classes and properties. You can find more information in How-to: Create new parsers
Using the Command Line Interface¶
The package provides a CLI for common operations:
# Export current masterdata to Excel
bam_masterdata export_to_excel --export-dir=example_export_excel
# Fill masterdata from OpenBIS
bam_masterdata fill_masterdata
A comprehensive explanation of all options can be found in the terminal when adding the --help
flag at the end of the command. For example:
Usage: bam_masterdata export_to_json [OPTIONS]
Export entities to JSON files to the `./artifacts/` folder.
Options:
--force-delete BOOLEAN (Optional) If set to `True`, it will delete the
current `./artifacts/` folder and create a new one.
Default is `False`.
--python-path TEXT (Optional) The path to the individual Python module
or the directory containing the Python modules to
process the datamodel. Default is the `/datamodel/`
directory.
--export-dir TEXT The directory where the JSON files will be exported.
Default is `./artifacts`.
--single-json BOOLEAN Whether the export to JSON is done to a single JSON
file. Default is False.
--help Show this message and exit.
Next Steps¶
Now that you've completed this tutorial, you can:
- Explore the How-to Guides: Learn specific tasks and workflows when using
bam-masterdata
. - Read the Explanations: Understand the concepts behind the system.
- Browse the API Reference: Dive deep into specific classes and methods.
Development Setup¶
If you want to contribute or modify the package:
git clone https://github.com/BAMresearch/bam-masterdata.git
cd bam-masterdata
python3 -m venv .venv
source .venv/bin/activate
./scripts/install_python_dependencies.sh
Read the README.md
for more details.