Reactions

The pipeline shape mirrors molecules; what changes is participants and the load step. There's no derive_chemistry stage — each participant has its own SMILES column, so chemistry derivation runs inline inside load_reactions.

Skeleton

apps/backend/lcmd_db/registry/subsets/my_reactions.py

from lcmd_db.core.lib.importers import (
    Source, SourceType, Subset, float_prop, pipeline,
)
from lcmd_db.core.lib.importers.steps import (
    ParticipantMapping,
    attach_structures,
    http,
    load_reactions,
    read_csv,
)

participants = {
    "catalyst": ParticipantMapping(
        role="catalyst",
        smiles_col="cat_smiles",
        name_template="Catalyst: {cat_label}",
    ),
    "substrate": ParticipantMapping(
        role="substrate",
        smiles_col="sub_smiles",
        name_template="Substrate: {sub_label}",
        structure_col="_xyz_substrate",
        step_from=0.0,
    ),
    "product": ParticipantMapping(
        role="product",
        smiles_col="prod_smiles",
        name_template="Product",
        structure_col="_xyz_product",
        step_from=1.0,
    ),
}

my_reactions = Subset(
    name="MyReactions",
    description="...",
    source=Source(name="...", type=SourceType.OTHER, url="..."),
    reaction_properties=[
        float_prop("Activation energy", col="ea", units="kcal/mol", required=True),
        float_prop("Reaction energy", col="er", units="kcal/mol"),
    ],
    pipeline=pipeline(
        fetch=http(url="...", filename="reactions.csv"),
        parse=(
            read_csv(path="reactions.csv")
            >> attach_structures(pattern="xyz/sub/{sub_label}.xyz",
                                 column="_xyz_substrate", required=False)
            >> attach_structures(pattern="xyz/prod/{prod_label}.xyz",
                                 column="_xyz_product", required=False)
        ),
        load=load_reactions(
            participants=participants,
            reaction_name_template="{cat_label}_{sub_label}",
        ),
    ),
)

ParticipantMapping

Each entry in participants describes how to extract one participant from a row.

Argument	Purpose
`role`	One of `catalyst`, `co_catalyst`, `reactant`, `substrate`, `product`, `intermediate`, `transition_state`, `solvent`, `additive`
`name_template`	`format`-style template for the participant's display name. `{label}` available
`label`	Override the label substituted into `name_template`. Defaults to the dict key
`smiles_col`	Row column containing the participant's SMILES
`inchi_col`	Alternative to SMILES; either is enough to load the participant
`inchi_key_col`	Row column with InChI key (used for dedup; not enough on its own to identify)
`structure_col`	Row column containing the participant's XYZ path (set by `attach_structures`)
`mw_col`	Row column with molecular weight; otherwise computed
`formula_col`	Row column with molecular formula; otherwise computed
`step_from`	Reaction-coordinate position. Use for intermediates, products
`step_to`	End of a transition arc. Set both `step_from` and `step_to` for transition states
`display_preview`	Whether to show this participant in card previews. Default `True`

A participant that has neither SMILES, InChI, nor a structure is silently skipped for that row — there's no way to identify the molecule.

XYZ-only participants

When a participant carries only structure_col (no SMILES, no InChI), load_reactions infers SMILES from the XYZ block via the xyz_to_smiles converter before deriving chemistry inline. The default is RDKitXyzToSmiles(); pass any callable to override:

from lcmd_db.apps.molecules.services.conversion import RDKitXyzToSmiles

load_reactions(
    participants=participants,
    xyz_to_smiles=RDKitXyzToSmiles(charge=-1),
)

load_reactions(
    participants=participants,
    xyz_to_smiles=lambda xyz: my_inference(xyz),
)

Conversion is best-effort — when it fails, the participant keeps its XYZ file but downstream chemistry fields stay empty.

Load step

load_reactions(
    participants=participants,
    reaction_name_template="Reaction {idx}",       # default
    reaction_description_template=None,            # optional
    get_participants_fn=None,                      # optional, see below
    xyz_to_smiles=None,                            # optional, see XYZ-only above
)

reaction_name_template and reaction_description_template are formatted with the row dict plus an idx (0-based row index).

Conditional participants

Some rows have more participants than others — common when several CSVs are merged. Pass get_participants_fn to pick keys per row:

def pick(row, idx):
    keys = ["catalyst", "substrate", "product"]
    if row.get("_in_srs"):           # marker column from read_csvs_merged
        keys += ["int1", "ts1", "int2"]
    return keys

load_reactions(participants=all_participants, get_participants_fn=pick)

The _in_srs marker comes from CSVSpec(marker="_in_srs", ...) in read_csvs_merged — see Advanced pipelines.

Display configuration

The reaction card on the web UI previews one or more participants. By default the system picks for you; override when it picks the wrong one:

from lcmd_db.core.lib.importers import ReactionDisplayConfig

reaction_display_config = ReactionDisplayConfig(
    preview_participants=["int2"],   # participant keys, in priority order
    max_preview_participants=1,
)

Putting it together

Reference example: apps/backend/lcmd_db/registry/subsets/pictet_spengler.py — five merged CSVs, 13 participants across three reaction tiers, conditional get_participants_fn, explicit display config.