Molecules ========= The **OSCAR!(NHC)** subset contains N-heterocyclic carbenes with 30+ computed stereoelectronic properties. .. code-block:: python from lcmd_db import load_dataset import polars as pl data = load_dataset("oscar_nhc") molecules = data.as_dataset("molecules") mol = molecules[0] mol.properties["smiles"] # str mol.properties["energy"] # float mol.properties["cation_energy"] # float # Filter and split heavy = molecules.filter(pl.col("molecular_weight") > 300) train, test = molecules.train_test_split(test_size=0.2) Restrict columns to speed up downloads: .. code-block:: python data = load_dataset( "oscar_nhc", molecule_properties=["smiles", "energy", "homo", "lumo"], ) .. tip:: Restricting ``molecule_properties`` to only the columns you need significantly reduces download size. Export ------ .. tab-set:: .. tab-item:: Polars :sync: polars .. code-block:: python df = molecules.to_polars() df.filter(pl.col("energy") < -100).select("smiles", "energy") .. tab-item:: Pandas :sync: pandas .. code-block:: python df = molecules.to_pandas() df[df["energy"] < -100][["smiles", "energy"]] .. tab-item:: ASE :sync: ase .. code-block:: python # Requires: uv add ase # Include structures in the download data = load_dataset("oscar_nhc", include=["molecules", "structures"]) molecules = data.as_dataset("molecules") atoms_list = molecules.to_ase() Structures ---------- Include XYZ structure files in the download: .. code-block:: python data = load_dataset("oscar_nhc", include=["molecules", "structures"]) mol = data.as_dataset("molecules")[0] mol.structure_path # Path to .xyz file .. seealso:: :class:`~lcmd_db.MoleculeDataset` --- full API reference, :func:`~lcmd_db.load_dataset` --- all loading options, :doc:`typed-stubs` --- IDE autocomplete for property keys