Reactions
Define a subset that imports reactions with multiple participants per row
The pipeline shape mirrors molecules;
what changes is participants and the load step. There's no
derive_chemistry stage — each participant has its own SMILES column, so
chemistry derivation runs inline inside load_reactions.
Skeleton
from lcmd_db.core.lib.importers import (
Source, SourceType, Subset, float_prop, pipeline,
)
from lcmd_db.core.lib.importers.steps import (
ParticipantMapping,
attach_structures,
http,
load_reactions,
read_csv,
)
participants = {
"catalyst": ParticipantMapping(
role="catalyst",
smiles_col="cat_smiles",
name_template="Catalyst: {cat_label}",
),
"substrate": ParticipantMapping(
role="substrate",
smiles_col="sub_smiles",
name_template="Substrate: {sub_label}",
structure_col="_xyz_substrate",
step_from=0.0,
),
"product": ParticipantMapping(
role="product",
smiles_col="prod_smiles",
name_template="Product",
structure_col="_xyz_product",
step_from=1.0,
),
}
my_reactions = Subset(
name="MyReactions",
description="...",
source=Source(name="...", type=SourceType.OTHER, url="..."),
reaction_properties=[
float_prop("Activation energy", col="ea", units="kcal/mol", required=True),
float_prop("Reaction energy", col="er", units="kcal/mol"),
],
pipeline=pipeline(
fetch=http(url="...", filename="reactions.csv"),
parse=(
read_csv(path="reactions.csv")
>> attach_structures(pattern="xyz/sub/{sub_label}.xyz",
column="_xyz_substrate", required=False)
>> attach_structures(pattern="xyz/prod/{prod_label}.xyz",
column="_xyz_product", required=False)
),
load=load_reactions(
participants=participants,
reaction_name_template="{cat_label}_{sub_label}",
),
),
)ParticipantMapping
Each entry in participants describes how to extract one participant from a row.
| Argument | Purpose |
|---|---|
role | One of catalyst, co_catalyst, reactant, substrate, product, intermediate, transition_state, solvent, additive |
name_template | format-style template for the participant's display name. {label} available |
label | Override the label substituted into name_template. Defaults to the dict key |
smiles_col | Row column containing the participant's SMILES |
inchi_col | Alternative to SMILES; either is enough to load the participant |
inchi_key_col | Row column with InChI key (used for dedup; not enough on its own to identify) |
structure_col | Row column containing the participant's XYZ path (set by attach_structures) |
mw_col | Row column with molecular weight; otherwise computed |
formula_col | Row column with molecular formula; otherwise computed |
step_from | Reaction-coordinate position. Use for intermediates, products |
step_to | End of a transition arc. Set both step_from and step_to for transition states |
display_preview | Whether to show this participant in card previews. Default True |
A participant that has neither SMILES, InChI, nor a structure is silently skipped for that row — there's no way to identify the molecule.
XYZ-only participants
When a participant carries only structure_col (no SMILES, no InChI),
load_reactions infers SMILES from the XYZ block via the xyz_to_smiles
converter before deriving chemistry inline. The default is
RDKitXyzToSmiles(); pass any callable to override:
from lcmd_db.apps.molecules.services.conversion import RDKitXyzToSmiles
load_reactions(
participants=participants,
xyz_to_smiles=RDKitXyzToSmiles(charge=-1),
)
load_reactions(
participants=participants,
xyz_to_smiles=lambda xyz: my_inference(xyz),
)Conversion is best-effort — when it fails, the participant keeps its XYZ file but downstream chemistry fields stay empty.
Load step
load_reactions(
participants=participants,
reaction_name_template="Reaction {idx}", # default
reaction_description_template=None, # optional
get_participants_fn=None, # optional, see below
xyz_to_smiles=None, # optional, see XYZ-only above
)reaction_name_template and reaction_description_template are formatted with
the row dict plus an idx (0-based row index).
Conditional participants
Some rows have more participants than others — common when several CSVs are
merged. Pass get_participants_fn to pick keys per row:
def pick(row, idx):
keys = ["catalyst", "substrate", "product"]
if row.get("_in_srs"): # marker column from read_csvs_merged
keys += ["int1", "ts1", "int2"]
return keys
load_reactions(participants=all_participants, get_participants_fn=pick)The _in_srs marker comes from CSVSpec(marker="_in_srs", ...) in
read_csvs_merged — see Advanced pipelines.
Display configuration
The reaction card on the web UI previews one or more participants. By default the system picks for you; override when it picks the wrong one:
from lcmd_db.core.lib.importers import ReactionDisplayConfig
reaction_display_config = ReactionDisplayConfig(
preview_participants=["int2"], # participant keys, in priority order
max_preview_participants=1,
)Putting it together
Reference example:
apps/backend/lcmd_db/registry/subsets/pictet_spengler.py — five merged CSVs,
13 participants across three reaction tiers, conditional get_participants_fn,
explicit display config.