\(\)
| type | # rooms | surface m2 | public trans | sold k€ | | (category) | (int) | (float) | (bool) | (float) | |------------|---------|------------|--------------|---------| | appartment | 3 | 50 | True | 450 | | house | 5 | 254 | False | 430 | | duplex | 4 | 68 | True | 712 | | appartment | 3 | 32 | False | 234 |
| appartment | 3 | 33 | True | ??? | | house | 4 | 210 | True | ??? |
Some of the challenges:
Scikit-learn est la référence en Machine Learning que de nombreuses entreprises utilisent. Chez OVHCloud, il est notamment utilisé pour le monitoring des 21000 et quelques équipements réseaux de nos data centers.
Olivier Nicol, Lead Data Scientist, OVHCloud
scikit-learn est le framework de référence en machine learning et un des premiers frameworks open-source de qualité industrielle.
Florian Douetteau, CEO, Dataiku
scikit-learn nous sert à accélérer les remboursements de sinistres automobiles ou à détecter les fraudes à l’assurance. C’est le couteau suisse du machine learning !
Marcin Detyniecki, directeur de la R&D, Axa
Parmi les avantages de la technologie scikit- learn, on trouve l’ergonomie, la simplicité d’usage, et la qualité de la documentation, distinctement saluée par l’ensemble de la communauté.
Mehdi Benchoufi, Chef de clinique Hôpital Hôtel Dieu
Widely used in: neurosciences, astronomy, geosciences, genomics, etc …
Easy to use:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier()
classifier.fit(X_train, y_train)
y_test = classifier.predict(X_test)
Easy to use:
from sklearn.svm import SVC
classifier = SVC()
classifier.fit(X_train, y_train)
y_test = classifier.predict(X_test)
Easy to use:
from sklearn.svm import SVC
classifier = SVC()
classifier.fit(X_train, y_train)
y_test = classifier.predict(X_test)
Easy to use:
from sklearn.svm import SVC
classifier = SVC()
classifier.fit(X_train, y_train)
y_test = classifier.predict(X_test)
Launch Sept 2018, very fruitful collaboration after 4 years
companies: better visibility for software they rely on, good for Public Relations and recruitment
scikit-learn: hire maintainers to consolidate project, useful feed-back from advanced users
Someone else may solve your problems (Dask for distributed computing, conda-forge for user-friendly packaging, Python 3.11 for more user-friendly error messages …)
Many people working across projects (scikit-learn maintainers contributing upstream (numpy, Scipy, Python, …) and downstream (Dask, dirty-cat, …)
IsolationForest, LocalOutlierFactor, and others
https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_anomaly_comparison.html
https://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html
https://github.com/ANSSI-FR/SecuML
ANSSI project: expert in the loop graphical tool (annotation, model training, model evaluation)
scikit-learn used for:
innovation without having to ask first
Vision: Machine learning as a means not an end
Versatile library: the right level of abstraction. Close to research, but seeking different tradeoffs
Numpy arrays as data containers. Fast enough.
Ensure code quality and maintainability