Dataset#
Base Class#
- class Dataset[source]#
Bases:
Generic[~E]Generic lazy dataset backed by a tabular data source.
Provides lazy loading, integer/slice indexing, polars-based filtering, column selection, train/test splitting, and export to polars, pandas, or ASE formats.
- property df: polars.DataFrame#
Materialised polars DataFrame (collected on first access).
- property lazy: polars.LazyFrame#
Underlying polars LazyFrame for deferred computation.
- select(*columns)[source]#
Return a new dataset restricted to the given columns (
idis always kept).- Parameters:
*columns (
str) – Column names to keep.- Return type:
Dataset[E]
- filter(expr)[source]#
Return a new dataset containing only rows matching the expression.
- Parameters:
expr (
polars.Expr) – A polars expression, e.g.pl.col("weight") > 100.- Return type:
Dataset[E]
- train_test_split(test_size=0.2, *, seed=42)[source]#
Split into train and test datasets by random shuffling.
Specialized Datasets#
- class MoleculeDataset[source]#
Bases:
Dataset[Molecule[~Properties]],Generic[~Properties]
- class ReactionDataset[source]#
Bases:
Dataset[Reaction[~Properties]],Generic[~Properties]
- class FragmentDataset[source]#
Bases:
Dataset[Fragment[~Properties,~FType]],Generic[~Properties,~FType]