Skip to content

How to Use in Pipelines

Interactive notebook

See the companion notebook for a runnable example. View ยท Open in marimo

This guide shows you how to use OptunaSearchCV with Scikit-Learn pipelines. Use this when you need to tune hyperparameters of estimators inside a pipeline, or nest OptunaSearchCV as a pipeline step.

Prerequisites

  • Sklearn-Optuna installed (Getting Started)
  • Familiarity with Scikit-Learn's Pipeline

Tune Pipeline Sub-Estimator Parameters

Use __ (double underscore) syntax to address parameters of estimators nested inside a pipeline:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from optuna.distributions import FloatDistribution

from sklearn_optuna import OptunaSearchCV

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", LogisticRegression(max_iter=200)),
])

param_distributions = {
    "clf__C": FloatDistribution(1e-2, 10.0, log=True),
}

search = OptunaSearchCV(pipe, param_distributions, n_trials=20)
search.fit(X, y)

The search tunes C on the LogisticRegression step while applying StandardScaler in each cross-validation fold.

Nest OptunaSearchCV Inside a Pipeline

OptunaSearchCV is a valid Scikit-Learn estimator, so you can use it as a pipeline step:

from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA

search = OptunaSearchCV(
    LogisticRegression(max_iter=200),
    {"C": FloatDistribution(1e-2, 10.0, log=True)},
    n_trials=10,
)

pipe = Pipeline([
    ("pca", PCA(n_components=5)),
    ("search", search),
])
pipe.fit(X, y)

Tune Parameters Across Multiple Pipeline Steps

Address parameters from different steps in the same search:

from optuna.distributions import IntDistribution

pipe = Pipeline([
    ("pca", PCA()),
    ("clf", LogisticRegression(max_iter=200)),
])

param_distributions = {
    "pca__n_components": IntDistribution(2, 10),
    "clf__C": FloatDistribution(1e-2, 10.0, log=True),
}

search = OptunaSearchCV(pipe, param_distributions, n_trials=30)
search.fit(X, y)

Manage Computational Cost

Nested searches multiply the total number of evaluations. An outer search with 10 trials wrapping an inner search with 5 trials produces 50 total evaluations (plus cross-validation folds). Keep n_trials low on the inner search and increase only after initial exploration.

See Also