Contributing to Sklearn-Optuna¶
Thank you for your interest in contributing to Sklearn-Optuna! This document provides guidelines for contributing to the project.
Code of Conduct¶
We are committed to providing a welcoming and inclusive environment for all contributors. Please be respectful and considerate in all interactions.
Getting Started¶
Prerequisites¶
Development Setup¶
-
Fork the repository on GitHub
-
Clone your fork:
- Install dependencies:
- Install pre-commit hooks:
Development Workflow¶
Making Changes¶
- Create a new branch:
-
Make your changes
-
Run tests:
- Format and fix code:
- Commit your changes:
We follow Conventional Commits for commit messages. The commit message format is enforced by commitizen pre-commit hooks, which will validate your commit messages automatically.
Valid commit message examples:
feat: add new featurefix: resolve bug in calculationdocs: update installation guidechore: update dependenciestest: add tests for new feature
Running Tests¶
Sklearn-Optuna uses pytest with markers to categorize tests into different types:
- Fast tests: Unit tests that run quickly without subprocess calls or heavy I/O
- Slow tests: Tests marked with
@pytest.mark.slowthat take longer to execute - Integration tests: Tests marked with
@pytest.mark.integrationthat run subprocesses or test multiple components together
Test Commands¶
Run fast tests only (recommended during development):
Run slow and integration tests:
Run all tests:
Run tests with coverage:
Run tests across multiple Python versions:
Run example notebook tests:
This runs all notebooks in the examples/ directory as Python scripts in parallel using pytest-xdist (-n auto). Each notebook is executed non-interactively to validate it runs without errors.
When to Mark Tests as Slow or Integration¶
Mark your tests appropriately to help maintain fast feedback during development:
- Use
@pytest.mark.slowfor tests that: - Take more than a few seconds to run
- Perform heavy computations
- Make network requests
-
Access external resources
-
Use
@pytest.mark.integrationfor tests that: - Run subprocess commands
- Test multiple components working together
- Require complex setup or teardown
-
Exercise end-to-end workflows
-
@pytest.mark.exampleis used intests/test_examples.pyto: - Validate example notebooks execute without errors
- Run notebooks in the
examples/directory - Test interactive documentation and tutorials
Example:
import pytest
@pytest.mark.slow
def test_large_computation():
# Long-running test
pass
@pytest.mark.integration
@pytest.mark.slow
def test_end_to_end_workflow():
# Complex integration test
pass
Test Organization¶
Follow these conventions when writing tests:
Class-based test structure: Group related tests into classes using the Test<Component><Scenario> naming pattern.
Fixture usage: Prefer fixtures from conftest.py over module-level data. See tests/conftest.py for available factories.
Property-based testing: Hypothesis is available for property-based testing of edge cases and invariants.
CI Test Strategy¶
The CI pipeline uses a two-tier testing strategy optimized for fast feedback:
-
Fast tests (
test-fastjob): Runs on minimum and maximum Python versions (3.11, 3.14) only: - Draft PRs: Ubuntu only - Quick feedback in ~2-3 minutes - Ready PRs/Main: All OS - Ubuntu, Windows, macOS - Cross-platform validation -
Full test suite (
test-fulljob): Runs all tests (fast + slow + integration) on Ubuntu across all Python versions (3.11-3.14) when the PR is not in draft mode or on the main branch. This comprehensive validation includes coverage reporting on the minimum supported Python version.
Code Quality¶
Run linters and type checkers:
Format code and fix issues:
Run all quality checks:
Docstring Standards¶
All public functions, methods, and classes require NumPy-style docstrings. Coverage is enforced at 100% by interrogate.
Check docstring coverage:
Required sections (as applicable):
Parameters- All function/method parameters with types and descriptionsReturns- Return value type and descriptionRaises- Exceptions raisedSee Also- Related classes/functionsReferences- Academic references for algorithms or methods usedNotes- Implementation details, mathematical backgroundExamples- Usage examples (tested viapytest --doctest-modules)
See Also format:
Use standard numpydoc format with short backtick names. The mkdocs-autorefs plugin automatically links backtick references (e.g., `ClassName`) to the corresponding API pages in rendered documentation. This means plain backtick-wrapped names in docstrings become clickable links in the docs site without any special syntax.
For hyperlinks, always use Markdown syntax: [text](url).
Documentation¶
Build documentation:
Serve documentation locally:
View all available commands:
Adding Examples¶
All examples are interactive marimo notebooks that combine code, markdown, and visualizations.
Creating a Notebook¶
Create a new marimo notebook in examples/<name>.py:
Required Structure¶
Every example notebook must follow this structure in order:
- Title: A top-level
# Titleheading describing the notebook topic - What You'll Learn: A
## What You'll Learnsection with a bulleted list of concrete learning goals - Prerequisites: A
## Prerequisitessection stating required prior knowledge (one-liner or short bullet list). For standalone dataset explorations, use "None: this is a standalone dataset exploration." - Numbered sections: Main content as
## 1. Section Name,## 2. Section Name, etc. - Key Takeaways: A
## Key Takeawayssection with bullet points summarizing important lessons learned - Next Steps: A
## Next Stepssection with bullet points linking to related notebooks or documentation
Example intro cell:
# Reduction Forecasting with sklearn
## What You'll Learn
- How `PointReductionForecaster` tabularizes time series data using lag features
- The difference between `target_transformer` and `feature_transformer` parameters
- Tuning hyperparameters with `GridSearchCV`
## Prerequisites
Basic familiarity with sklearn's fit/predict API and time series concepts (trend, seasonality).
Marimo Cell Conventions¶
- Use
hide_code=Trueon all markdown cells, import cells, and utility/helper cells - Use
r"""..."""(raw triple-quoted strings) for markdown cell content -
All notebooks declare dependencies using PEP 723 inline script metadata at the top of the file:
-
Dependencies are sorted alphabetically and only list third-party packages actually imported by the notebook.
marimoitself is NOT listed as a dependency (it is the runner, not a dependency of the script).- To add a dependency:
uv add --script notebook.py <package>or edit the header manually. - To run in an isolated sandbox:
uv run marimo edit --sandbox notebook.py. - Group all imports into a single hidden cell after the metadata header
Content Guidelines¶
- Gallery metadata: Every example notebook should include a
__gallery__variable in the first@app.celldefiningtitle,description, andcategoryfor the example gallery. - Markdown density: Each numbered section should open with a descriptive markdown cell explaining the concept before any code cells. Consecutive code cells within the same section are acceptable when logically grouped.
- No emojis: Do not use emojis anywhere in notebooks whether it is in headings, content bullets, or concluding remarks.
- API cross-links: When mentioning sklearn_optuna classes or functions in markdown cells, wrap them in backtick-link syntax pointing to the API page (e.g.,
[`SeasonalNaive`](/pages/api/generated/sklearn_optuna.point.naive.SeasonalNaive/)). - Key Takeaways format: Use bold for key terms with plain descriptions (e.g.,
- **Reduction forecasting** converts time series into tabular regression via lag features) - Next Steps format: Use bold labels with linked notebook references (e.g.,
- **Naive baselines**: See [naive_forecasters.py](../../../examples/point/naive_forecasters/) to compare). Always link to the rendered example page, not the raw file.
Testing and Documentation¶
Run the example test suite to verify your notebook passes:
Add a link to your example in docs/pages/tutorials/examples.md:
The mkdocs hooks automatically export notebooks to HTML during docs build. All notebooks in examples/ are automatically discovered and tested by test_examples.py using pytest's parametrization feature, which runs them in parallel for fast validation.
Before You Open a PR¶
- Run
just test-fast- all fast tests pass - Run
just fix- code is formatted and linted - Write or update tests for your changes
- If you changed docs, run
just serveand verify they render - Use conventional commit messages
- Keep the PR focused on a single concern
Submitting Changes¶
- Push your changes to your fork:
-
Open a Pull Request on GitHub
-
Ensure all CI checks pass
-
Wait for review and address any feedback
Pull Request Guidelines¶
- Write clear, descriptive PR titles following Conventional Commits
- Include a description of the changes
- Add tests for new functionality
- Update documentation as needed
- Ensure all tests pass
- Keep PRs focused and atomic
Commit Message Convention¶
We use Conventional Commits enforced by commitizen:
feat:- New features (triggers minor version bump)fix:- Bug fixes (triggers patch version bump)docs:- Documentation changesstyle:- Code style changes (formatting, etc.)refactor:- Code refactoringtest:- Adding or updating testschore:- Maintenance tasksperf:- Performance improvementsci:- CI/CD changes
Breaking changes: Add ! after the type or add BREAKING CHANGE: in the footer to trigger a major version bump.
Example with scope:
Example with breaking change:
git commit -m "feat!: redesign authentication system
BREAKING CHANGE: authentication now requires API keys instead of passwords"
The pre-commit hook will validate your commit messages and prevent commits that don't follow the convention.
Release Process¶
Maintainers only
The release process is managed by project maintainers. Contributors do not need to create releases. Open PRs and a maintainer will handle versioning and publishing.
Releases are fully automated through GitHub Actions when a new tag is pushed, with a manual approval gate before publishing to PyPI to ensure quality control.
graph LR
A[Push Tag<br/>v*.*.*] --> B[changelog.yml]
B --> C[Generate<br/>CHANGELOG.md]
B --> D[Build Package<br/>validation]
C --> E[Create PR]
E --> F[Review & Merge<br/>PR]
F --> G[publish-release.yml]
G --> H[Create GitHub<br/>Release]
H --> I{Manual<br/>Approval}
I -->|Approve| J[Publish to PyPI]
style I fill:#f59e0b,stroke:#333,stroke-width:2px,color:#fff
style J fill:#10b981,stroke:#333,stroke-width:2px,color:#fff
How It Works¶
-
Tag a release:
bash git tag v0.2.0 -m "Release v0.2.0" git push origin v0.2.0 -
Automated changelog workflow (
changelog.yml): - Generates changelog from conventional commits using git-cliff - Creates a Pull Request with the updated CHANGELOG.md - Builds the package distributions (wheels and sdist) for immediate validation - Stores distributions as workflow artifacts (reused later to avoid rebuilding) -
Review and merge the changelog PR: - A maintainer reviews the generated changelog - Once approved, merge the PR to main
-
Automated release workflow (
publish-release.yml): - Creates a GitHub Release with generated release notes - Attaches distribution files to the release - Waits for manual approval before proceeding to PyPI -
Manual approval for PyPI publishing: - Designated reviewers receive a notification - Review the GitHub Release to verify everything is correct - Approve the deployment to publish to PyPI - Package is published using Trusted Publishing (OIDC, no tokens needed)
-
Release notes generation: - All commits since the last tag are analyzed - Commits are grouped by type (Added, Fixed, Documentation, etc.) - Only commits following conventional format are included - Breaking changes are highlighted
Version Numbering¶
This project uses Semantic Versioning:
- Major (1.0.0): Breaking changes
- Minor (0.1.0): New features (backward compatible)
- Patch (0.0.1): Bug fixes (backward compatible)
Use conventional commits to communicate the type of change, and select the appropriate version number when tagging.
Questions?¶
If you have any questions, feel free to:
Thank you for contributing! 🎉