Version 1.3.0¶
In Development
Legend for changelogs¶
Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Changed models¶
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Enhancement
multiclass.OutputCodeClassifier.predictnow uses a more efficient pairwise distance reduction. As a consequence, the tie-breaking strategy is different and thus the predicted labels may be different. #25196 by Guillaume Lemaitre.Enhancement The
fit_transformmethod ofdecomposition.DictionaryLearningis more efficient but may produce different results as in previous versions whentransform_algorithmis not the same asfit_algorithmand the number of iterations is small. #24871 by Omar Salman.Fix Treat more consistently small values in the
WandHmatrices during thefitandtransformsteps ofdecomposition.NMFanddecomposition.MiniBatchNMFwhich can produce different results than previous versions. #25438 by Yotam Avidar-Constantini.Enhancement The
sample_weightparameter now will be used in centroids initialization forcluster.KMeans,cluster.BisectingKMeansandcluster.MiniBatchKMeans. This change will break backward compatibility, since numbers generated from same random seeds will be different. #25752 by Gleb Levitski, Jérémie du Boisberranger, Guillaume Lemaitre.
Changes impacting all modules¶
Enhancement The
get_feature_names_outmethod of the following classes now raises aNotFittedErrorif the instance is not fitted. This ensures the error is consistent in all estimators with theget_feature_names_outmethod.The
NotFittedErrordisplays an informative message asking to fit the instance with the appropriate arguments.#25294, #25308, #25291, #25367, #25402, by John Pangas, Rahil Parikh , and Alex Buzenet.
Enhancement Added a multi-threaded Cython routine to the compute squared Euclidean distances (sometimes followed by a fused reduction operation) for a pair of datasets consisting of a sparse CSR matrix and a dense NumPy.
This can improve the performance of following functions and estimators:
A typical example of this performance improvement happens when passing a sparse CSR matrix to the
predictortransformmethod of estimators that rely on a dense NumPy representation to store their fitted parameters (or the reverse).For instance,
sklearn.NearestNeighbors.kneighborsis now up to 2 times faster for this case on commonly available laptops.
Changelog¶
sklearn.feature_selection¶
Enhancement All selectors in
sklearn.feature_selectionwill preserve a DataFrame’s dtype when transformed. #25102 by Thomas Fan.Fix
feature_selection.SequentialFeatureSelector’scvparameter now supports generators. #25973 byYao Xiao <Charlie-XIAO>.
sklearn.base¶
Feature A
__sklearn_clone__protocol is now available to override the default behavior ofbase.clone. #24568 by Thomas Fan.
sklearn.calibration¶
Fix
calibration.CalibratedClassifierCVnow does not enforce sample alignment onfit_params. #25805 by Adrin Jalali.
sklearn.cluster¶
API Change The
sample_weightparameter inpredictforcluster.KMeans.predictandcluster.MiniBatchKMeans.predictis now deprecated and will be removed in v1.5. #25251 by Gleb Levitski.Enhancement The
sample_weightparameter now will be used in centroids initialization forcluster.KMeans,cluster.BisectingKMeansandcluster.MiniBatchKMeans. This change will break backward compatibility, since numbers generated from same random seeds will be different. #25752 by Gleb Levitski, Jérémie du Boisberranger, Guillaume Lemaitre.
sklearn.datasets¶
API Change The
data_transposedargument ofdatasets.make_sparse_coded_signalis deprecated and will be removed in v1.5. #25784 by @Jérémie du Boisberranger.
sklearn.decomposition¶
Enhancement
decomposition.DictionaryLearningnow accepts the parametercallbackfor consistency with the functiondecomposition.dict_learning. #24871 by Omar Salman.Efficiency
decomposition.MiniBatchDictionaryLearninganddecomposition.MiniBatchSparsePCAare now faster for small batch sizes by avoiding duplicate validations. #25490 by Jérémie du Boisberranger.Fix Treat more consistently small values in the
WandHmatrices during thefitandtransformsteps ofdecomposition.NMFanddecomposition.MiniBatchNMFwhich can produce different results than previous versions. #25438 by Yotam Avidar-Constantini.
sklearn.ensemble¶
Feature
ensemble.HistGradientBoostingRegressornow supports the Gamma deviance loss vialoss="gamma". Using the Gamma deviance as loss function comes in handy for modelling skewed distributed, strictly positive valued targets. #22409 by Christian Lorentzen.Feature Compute a custom out-of-bag score by passing a callable to
ensemble.RandomForestClassifier,ensemble.RandomForestRegressor,ensemble.ExtraTreesClassifierandensemble.ExtraTreesRegressor. #25177 by Tim Head.Feature
ensemble.GradientBoostingClassifiernow exposes out-of-bag scores via theoob_scores_oroob_score_attributes. #24882 by Ashwin Mathur.Efficiency
ensemble.IsolationForestpredict time is now faster (typically by a factor of 8 or more). Internally, the estimator now precomputes decision path lengths per tree atfittime. It is therefore not possible to load an estimator trained with scikit-learn 1.2 to make it predict with scikit-learn 1.3: retraining with scikit-learn 1.3 is required. #25186 by Felipe Breve Siola.Enhancement
ensemble.BaggingClassifierandensemble.BaggingRegressorexpose theallow_nantag from the underlying estimator. #25506 by Thomas Fan.Fix
ensemble.RandomForestClassifier.fitsetsmax_samples = 1whenmax_samplesis a float andround(n_samples * max_samples) < 1. #25601 by Jan Fidor.Fix
ensemble.IsolationForest.fitno longer warns about missing feature names when called withcontaminationnot"auto"on a pandas dataframe. #25931 by Yao Xiao.
sklearn.exception¶
Feature Added
exception.InconsistentVersionWarningwhich is raised when a scikit-learn estimator is unpickled with a scikit-learn version that is inconsistent with the sckit-learn verion the estimator was pickled with. #25297 by Thomas Fan.
sklearn.feature_extraction¶
API Change
feature_extraction.image.PatchExtractornow follows the transformer API of scikit-learn. This class is defined as a stateless transformer meaning that it is note required to callfitbefore callingtransform. Parameter validation only happens atfittime. #24230 by Guillaume Lemaitre.
sklearn.impute¶
Enhancement Added the parameter
fill_valuetoimpute.IterativeImputer. #25232 by Thijs van Weezel.
sklearn.inspection¶
API Change
inspection.partial_dependencereturns autils.Bunchwith new key:grid_values. Thevalueskey is deprecated in favor ofgrid_valuesand thevalueskey will be removed in 1.5. #21809 and #25732 by Thomas Fan.
sklearn.linear_model¶
Enhancement
SGDClassifier,SGDRegressorandSGDOneClassSVMnow preserve dtype fornumpy.float32. #25587 by Omar Salman
sklearn.metrics¶
Efficiency The computation of the expected mutual information in
metrics.adjusted_mutual_info_scoreis now faster when the number of unique labels is large and its memory usage is reduced in general. #25713 by Kshitij Mathur, Guillaume Lemaitre, Omar Salman and Jérémie du Boisberranger.Feature Adds
zero_division=np.nanto multiple classification metrics:precision_score,recall_score,f1_score,fbeta_score,precision_recall_fscore_support,classification_report. Whenzero_division=np.nanand there is a zero division, the metric is undefined and is excluded from averaging. When not used for averages, the value returned isnp.nan. #25531 by Marc Torrellas Socastro.Fix
metric.manhattan_distancesnow supports readonly sparse datasets. #25432 by Julien Jerphanion.Fix Fixed
classification_reportso that empty input will returnnp.nan. Previously, “macro avg” andweighted avgwould return e.g.f1-score=np.nanandf1-score=0.0, being inconsistent. Now, they both returnnp.nan. #25531 by Marc Torrellas Socastro.Fix
metric.ndcg_scorenow gives a meaningful error message for input of length 1. #25672 by Lene Preuss and Wei-Chun Chu.Enhancement
metrics.silhouette_samplesnows accepts a sparse matrix of pairwise distances between samples, or a feature array. #18723 by Sahil Gupta and #24677 by Ashwin Mathur.Enhancement A new parameter
drop_intermediatewas added tometrics.precision_recall_curve,metrics.PrecisionRecallDisplay.from_estimator,metrics.PrecisionRecallDisplay.from_predictions, which drops some suboptimal thresholds to create lighter precision-recall curves. #24668 by @dberenbaum.Fix
log_lossraises a warning if the values of the parametery_predare not normalized, instead of actually normalizing them in the metric. Starting from 1.5 this will raise an error. #25299 by @Omar Salman <OmarManzoor.API Change The
epsparameter of thelog_losshas been deprecated and will be removed in 1.5. #25299 by Omar Salman.
sklearn.model_selection¶
Enhancement
model_selection.cross_validateaccepts a new parameterreturn_indicesto return the train-test indices of each cv split. #25659 by Guillaume Lemaitre.
sklearn.naive_bayes¶
Fix
naive_bayes.GaussianNBdoes not raise anymore aZeroDivisionErrorwhen the providedsample_weightreduces the problem to a single class infit. #24140 by Jonathan Ohayon and Chiara Marmo.
sklearn.neighbors¶
Fix Remove support for
KulsinskiDistanceinneighbors.BallTree. This dissimilarity is not a metric and cannot be supported by the BallTree. #25417 by Guillaume Lemaitre.Enhancement The performance of
neighbors.KNeighborsClassifier.predictand ofneighbors.KNeighborsClassifier.predict_probahas been improved whenn_neighborsis large andalgorithm="brute"with non Euclidean metrics. #24076 by Meekail Zain, Julien Jerphanion.
sklearn.neural_network¶
Fix
neural_network.MLPRegressorandneural_network.MLPClassifierreports the rightn_iter_whenwarm_start=True. It corresponds to the number of iterations performed on the current call tofitinstead of the total number of iterations performed since the initialization of the estimator. #25443 by Marvin Krawutschke.
sklearn.pipeline¶
Feature
pipeline.FeatureUnioncan now use indexing notation (e.g.feature_union["scalar"]) to access transformers by name. #25093 by Thomas Fan.Feature
pipeline.FeatureUnioncan now access thefeature_names_in_attribute if theXvalue seen during.fithas acolumnsattribute and all columns are strings. e.g. whenXis apandas.DataFrame#25220 by Ian Thompson.
sklearn.preprocessing¶
Major Feature Introduces
preprocessing.TargetEncoderwhich is a categorical encoding based on target mean conditioned on the value of the category. #25334 by Thomas Fan.Enhancement Adds a
feature_name_combinerparameter topreprocessing.OneHotEncoder. This specifies a custom callable to create feature names to be returned byget_feature_names_out. The callable combines input arguments(input_feature, category)to a string. #22506 by Mario Kostelac.Enhancement Added support for
sample_weightinpreprocessing.KBinsDiscretizer. This allows specifying the parametersample_weightfor each sample to be used while fitting. The option is only available whenstrategyis set toquantileandkmeans. #24935 by Seladus, Guillaume Lemaitre, and Dea María Léon, #25257 by Gleb Levitski.Feature
preprocessing.OrdinalEncodernow supports grouping infrequent categories into a single feature. Grouping infrequent categories is enabled by specifying how to select infrequent categories withmin_frequencyormax_categories. #25677 by Thomas Fan.Fix
AdditiveChi2Sampleris now stateless. Thesample_interval_attribute is deprecated and will be removed in 1.5. #25190 by Vincent Maladière.
sklearn.tree¶
Enhancement Adds a
class_namesparameter totree.export_text. This allows specifying the parameterclass_namesfor each target class in ascending numerical order. #25387 by William M and crispinlogan.
sklearn.utils¶
API Change
estimator_checks.check_transformers_unfitted_statelesshas been introduced to ensure stateless transformers don’t raiseNotFittedErrorduringtransformwith no prior call tofitorfit_transform. #25190 by Vincent Maladière.API Change A
FutureWarningis now raised when instantiating a class which inherits from a deprecated base class (i.e. decorated byutils.deprecated) and which overrides the__init__method. #25733 by Brigitta Sipőcz and Jérémie du Boisberranger.Fix Fixes
utils.validation.check_arrayto properly convert pandas extension arrays. #25813 by Thomas Fan.Fix
utils.validation.check_arraynow suports pandas DataFrames with extension arrays and object dtypes by return an ndarray with object dtype. #25814 by Thomas Fan.
sklearn.semi_supervised¶
Enhancement
LabelSpreading.fitandLabelPropagation.fitnow accepts sparse metrics. #19664 by Kaushik Amar Das.
Code and Documentation Contributors¶
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.2, including:
TODO: update at the time of the release.