Version 1.3.0¶

In Development

Legend for changelogs¶

Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Changed models¶

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

Enhancement multiclass.OutputCodeClassifier.predict now uses a more efficient pairwise distance reduction. As a consequence, the tie-breaking strategy is different and thus the predicted labels may be different. #25196 by Guillaume Lemaitre.
Enhancement The fit_transform method of decomposition.DictionaryLearning is more efficient but may produce different results as in previous versions when transform_algorithm is not the same as fit_algorithm and the number of iterations is small. #24871 by Omar Salman.
Fix Treat more consistently small values in the W and H matrices during the fit and transform steps of decomposition.NMF and decomposition.MiniBatchNMF which can produce different results than previous versions. #25438 by Yotam Avidar-Constantini.
Enhancement The sample_weight parameter now will be used in centroids initialization for cluster.KMeans, cluster.BisectingKMeans and cluster.MiniBatchKMeans. This change will break backward compatibility, since numbers generated from same random seeds will be different. #25752 by Gleb Levitski, Jérémie du Boisberranger, Guillaume Lemaitre.

Changes impacting all modules¶

Enhancement The get_feature_names_out method of the following classes now raises a NotFittedError if the instance is not fitted. This ensures the error is consistent in all estimators with the get_feature_names_out method.
The NotFittedError displays an informative message asking to fit the instance with the appropriate arguments.

#25294, #25308, #25291, #25367, #25402, by John Pangas, Rahil Parikh , and Alex Buzenet.
Enhancement Added a multi-threaded Cython routine to the compute squared Euclidean distances (sometimes followed by a fused reduction operation) for a pair of datasets consisting of a sparse CSR matrix and a dense NumPy.

This can improve the performance of following functions and estimators:
A typical example of this performance improvement happens when passing a sparse CSR matrix to the predict or transform method of estimators that rely on a dense NumPy representation to store their fitted parameters (or the reverse).

For instance, sklearn.NearestNeighbors.kneighbors is now up to 2 times faster for this case on commonly available laptops.

#25044 by Julien Jerphanion.

Changelog¶

`sklearn.feature_selection`¶

Enhancement All selectors in sklearn.feature_selection will preserve a DataFrame’s dtype when transformed. #25102 by Thomas Fan.
Fix feature_selection.SequentialFeatureSelector’s cv parameter now supports generators. #25973 by Yao Xiao <Charlie-XIAO>.

`sklearn.base`¶

Feature A __sklearn_clone__ protocol is now available to override the default behavior of base.clone. #24568 by Thomas Fan.

`sklearn.calibration`¶

Fix calibration.CalibratedClassifierCV now does not enforce sample alignment on fit_params. #25805 by Adrin Jalali.

`sklearn.cluster`¶

API Change The sample_weight parameter in predict for cluster.KMeans.predict and cluster.MiniBatchKMeans.predict is now deprecated and will be removed in v1.5. #25251 by Gleb Levitski.
Enhancement The sample_weight parameter now will be used in centroids initialization for cluster.KMeans, cluster.BisectingKMeans and cluster.MiniBatchKMeans. This change will break backward compatibility, since numbers generated from same random seeds will be different. #25752 by Gleb Levitski, Jérémie du Boisberranger, Guillaume Lemaitre.

`sklearn.datasets`¶

API Change The data_transposed argument of datasets.make_sparse_coded_signal is deprecated and will be removed in v1.5. #25784 by @Jérémie du Boisberranger.

`sklearn.decomposition`¶

Enhancement decomposition.DictionaryLearning now accepts the parameter callback for consistency with the function decomposition.dict_learning. #24871 by Omar Salman.
Efficiency decomposition.MiniBatchDictionaryLearning and decomposition.MiniBatchSparsePCA are now faster for small batch sizes by avoiding duplicate validations. #25490 by Jérémie du Boisberranger.
Fix Treat more consistently small values in the W and H matrices during the fit and transform steps of decomposition.NMF and decomposition.MiniBatchNMF which can produce different results than previous versions. #25438 by Yotam Avidar-Constantini.

`sklearn.ensemble`¶

Feature ensemble.HistGradientBoostingRegressor now supports the Gamma deviance loss via loss="gamma". Using the Gamma deviance as loss function comes in handy for modelling skewed distributed, strictly positive valued targets. #22409 by Christian Lorentzen.
Feature Compute a custom out-of-bag score by passing a callable to ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, ensemble.ExtraTreesClassifier and ensemble.ExtraTreesRegressor. #25177 by Tim Head.
Feature ensemble.GradientBoostingClassifier now exposes out-of-bag scores via the oob_scores_ or oob_score_ attributes. #24882 by Ashwin Mathur.
Efficiency ensemble.IsolationForest predict time is now faster (typically by a factor of 8 or more). Internally, the estimator now precomputes decision path lengths per tree at fit time. It is therefore not possible to load an estimator trained with scikit-learn 1.2 to make it predict with scikit-learn 1.3: retraining with scikit-learn 1.3 is required. #25186 by Felipe Breve Siola.
Enhancement ensemble.BaggingClassifier and ensemble.BaggingRegressor expose the allow_nan tag from the underlying estimator. #25506 by Thomas Fan.
Fix ensemble.RandomForestClassifier.fit sets max_samples = 1 when max_samples is a float and round(n_samples * max_samples) < 1. #25601 by Jan Fidor.
Fix ensemble.IsolationForest.fit no longer warns about missing feature names when called with contamination not "auto" on a pandas dataframe. #25931 by Yao Xiao.

`sklearn.exception`¶

Feature Added exception.InconsistentVersionWarning which is raised when a scikit-learn estimator is unpickled with a scikit-learn version that is inconsistent with the sckit-learn verion the estimator was pickled with. #25297 by Thomas Fan.

`sklearn.feature_extraction`¶

API Change feature_extraction.image.PatchExtractor now follows the transformer API of scikit-learn. This class is defined as a stateless transformer meaning that it is note required to call fit before calling transform. Parameter validation only happens at fit time. #24230 by Guillaume Lemaitre.

`sklearn.impute`¶

Enhancement Added the parameter fill_value to impute.IterativeImputer. #25232 by Thijs van Weezel.

`sklearn.inspection`¶

API Change inspection.partial_dependence returns a utils.Bunch with new key: grid_values. The values key is deprecated in favor of grid_values and the values key will be removed in 1.5. #21809 and #25732 by Thomas Fan.

`sklearn.linear_model`¶

Enhancement SGDClassifier, SGDRegressor and SGDOneClassSVM now preserve dtype for numpy.float32. #25587 by Omar Salman

`sklearn.metrics`¶

Efficiency The computation of the expected mutual information in metrics.adjusted_mutual_info_score is now faster when the number of unique labels is large and its memory usage is reduced in general. #25713 by Kshitij Mathur, Guillaume Lemaitre, Omar Salman and Jérémie du Boisberranger.
Feature Adds zero_division=np.nan to multiple classification metrics: precision_score, recall_score, f1_score, fbeta_score, precision_recall_fscore_support, classification_report. When zero_division=np.nan and there is a zero division, the metric is undefined and is excluded from averaging. When not used for averages, the value returned is np.nan. #25531 by Marc Torrellas Socastro.
Fix metric.manhattan_distances now supports readonly sparse datasets. #25432 by Julien Jerphanion.
Fix Fixed classification_report so that empty input will return np.nan. Previously, “macro avg” and weighted avg would return e.g. f1-score=np.nan and f1-score=0.0, being inconsistent. Now, they both return np.nan. #25531 by Marc Torrellas Socastro.
Fix metric.ndcg_score now gives a meaningful error message for input of length 1. #25672 by Lene Preuss and Wei-Chun Chu.
Enhancement metrics.silhouette_samples nows accepts a sparse matrix of pairwise distances between samples, or a feature array. #18723 by Sahil Gupta and #24677 by Ashwin Mathur.
Enhancement A new parameter drop_intermediate was added to metrics.precision_recall_curve, metrics.PrecisionRecallDisplay.from_estimator, metrics.PrecisionRecallDisplay.from_predictions, which drops some suboptimal thresholds to create lighter precision-recall curves. #24668 by @dberenbaum.
Fix log_loss raises a warning if the values of the parameter y_pred are not normalized, instead of actually normalizing them in the metric. Starting from 1.5 this will raise an error. #25299 by @Omar Salman <OmarManzoor.
API Change The eps parameter of the log_loss has been deprecated and will be removed in 1.5. #25299 by Omar Salman.

`sklearn.model_selection`¶

Enhancement model_selection.cross_validate accepts a new parameter return_indices to return the train-test indices of each cv split. #25659 by Guillaume Lemaitre.

`sklearn.naive_bayes`¶

Fix naive_bayes.GaussianNB does not raise anymore a ZeroDivisionError when the provided sample_weight reduces the problem to a single class in fit. #24140 by Jonathan Ohayon and Chiara Marmo.

`sklearn.neighbors`¶

Fix Remove support for KulsinskiDistance in neighbors.BallTree. This dissimilarity is not a metric and cannot be supported by the BallTree. #25417 by Guillaume Lemaitre.
Enhancement The performance of neighbors.KNeighborsClassifier.predict and of neighbors.KNeighborsClassifier.predict_proba has been improved when n_neighbors is large and algorithm="brute" with non Euclidean metrics. #24076 by Meekail Zain, Julien Jerphanion.

`sklearn.neural_network`¶

Fix neural_network.MLPRegressor and neural_network.MLPClassifier reports the right n_iter_ when warm_start=True. It corresponds to the number of iterations performed on the current call to fit instead of the total number of iterations performed since the initialization of the estimator. #25443 by Marvin Krawutschke.

`sklearn.pipeline`¶

Feature pipeline.FeatureUnion can now use indexing notation (e.g. feature_union["scalar"]) to access transformers by name. #25093 by Thomas Fan.
Feature pipeline.FeatureUnion can now access the feature_names_in_ attribute if the X value seen during .fit has a columns attribute and all columns are strings. e.g. when X is a pandas.DataFrame #25220 by Ian Thompson.

`sklearn.preprocessing`¶

Major Feature Introduces preprocessing.TargetEncoder which is a categorical encoding based on target mean conditioned on the value of the category. #25334 by Thomas Fan.
Enhancement Adds a feature_name_combiner parameter to preprocessing.OneHotEncoder. This specifies a custom callable to create feature names to be returned by get_feature_names_out. The callable combines input arguments (input_feature, category) to a string. #22506 by Mario Kostelac.
Enhancement Added support for sample_weight in preprocessing.KBinsDiscretizer. This allows specifying the parameter sample_weight for each sample to be used while fitting. The option is only available when strategy is set to quantile and kmeans. #24935 by Seladus, Guillaume Lemaitre, and Dea María Léon, #25257 by Gleb Levitski.
Feature preprocessing.OrdinalEncoder now supports grouping infrequent categories into a single feature. Grouping infrequent categories is enabled by specifying how to select infrequent categories with min_frequency or max_categories. #25677 by Thomas Fan.
Fix AdditiveChi2Sampler is now stateless. The sample_interval_ attribute is deprecated and will be removed in 1.5. #25190 by Vincent Maladière.

`sklearn.tree`¶

Enhancement Adds a class_names parameter to tree.export_text. This allows specifying the parameter class_names for each target class in ascending numerical order. #25387 by William M and crispinlogan.

`sklearn.utils`¶

API Change estimator_checks.check_transformers_unfitted_stateless has been introduced to ensure stateless transformers don’t raise NotFittedError during transform with no prior call to fit or fit_transform. #25190 by Vincent Maladière.
API Change A FutureWarning is now raised when instantiating a class which inherits from a deprecated base class (i.e. decorated by utils.deprecated) and which overrides the __init__ method. #25733 by Brigitta Sipőcz and Jérémie du Boisberranger.
Fix Fixes utils.validation.check_array to properly convert pandas extension arrays. #25813 by Thomas Fan.
Fix utils.validation.check_array now suports pandas DataFrames with extension arrays and object dtypes by return an ndarray with object dtype. #25814 by Thomas Fan.

`sklearn.semi_supervised`¶

Enhancement LabelSpreading.fit and LabelPropagation.fit now accepts sparse metrics. #19664 by Kaushik Amar Das.

Code and Documentation Contributors¶

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.2, including:

TODO: update at the time of the release.