inria.github.io
Open in
urlscan Pro
2606:50c0:8000::153
Public Scan
Submitted URL: http://inria.github.io/scikit-learn-mooc/python_scripts/parameter_tuning_manual.html
Effective URL: https://inria.github.io/scikit-learn-mooc/python_scripts/parameter_tuning_manual.html
Submission: On June 22 via api from CA — Scanned from CA
Effective URL: https://inria.github.io/scikit-learn-mooc/python_scripts/parameter_tuning_manual.html
Submission: On June 22 via api from CA — Scanned from CA
Form analysis
1 forms found in the DOMGET ../search.html
<form class="bd-search d-flex align-items-center" action="../search.html" method="get">
<i class="fa-solid fa-magnifying-glass"></i>
<input type="search" class="form-control" name="q" id="search-input" placeholder="Search this book..." aria-label="Search this book..." autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false">
<span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
</form>
Text Content
Skip to main content Ctrl+K * Introduction Machine Learning Concepts * π₯ Introducing machine-learning concepts * β Quiz Intro.01 The predictive modeling pipeline * Module overview * Tabular data exploration * First look at our dataset * π Exercise M1.01 * π Solution for Exercise M1.01 * β Quiz M1.01 * Fitting a scikit-learn model on numerical data * First model with scikit-learn * π Exercise M1.02 * π Solution for Exercise M1.02 * Working with numerical data * π Exercise M1.03 * π Solution for Exercise M1.03 * Preprocessing for numerical features * π₯ Validation of a model * Model evaluation using cross-validation * β Quiz M1.02 * Handling categorical data * Encoding of categorical variables * π Exercise M1.04 * π Solution for Exercise M1.04 * Using numerical and categorical variables together * π Exercise M1.05 * π Solution for Exercise M1.05 * π₯ Visualizing scikit-learn pipelines in Jupyter * Visualizing scikit-learn pipelines in Jupyter * β Quiz M1.03 * π Wrap-up quiz 1 * Main take-away Selecting the best model * Module overview * Overfitting and underfitting * π₯ Overfitting and Underfitting * Cross-validation framework * β Quiz M2.01 * Validation and learning curves * π₯ Comparing train and test errors * Overfit-generalization-underfit * Effect of the sample size in cross-validation * π Exercise M2.01 * π Solution for Exercise M2.01 * β Quiz M2.02 * Bias versus variance trade-off * π₯ Bias versus Variance * β Quiz M2.03 * π Wrap-up quiz 2 * Main take-away Hyperparameter tuning * Module overview * Manual tuning * Set and get hyperparameters in scikit-learn * π Exercise M3.01 * π Solution for Exercise M3.01 * β Quiz M3.01 * Automated tuning * Hyperparameter tuning by grid-search * Hyperparameter tuning by randomized-search * π₯ Analysis of hyperparameter search results * Analysis of hyperparameter search results * Evaluation and hyperparameter tuning * π Exercise M3.02 * π Solution for Exercise M3.02 * β Quiz M3.02 * π Wrap-up quiz 3 * Main take-away Linear models * Module overview * Intuitions on linear models * π₯ Intuitions on linear models * β Quiz M4.01 * Linear regression * Linear regression without scikit-learn * π Exercise M4.01 * π Solution for Exercise M4.01 * Linear regression using scikit-learn * β Quiz M4.02 * Modelling non-linear features-target relationships * π Exercise M4.02 * π Solution for Exercise M4.02 * Linear regression for a non-linear features-target relationship * π Exercise M4.03 * π Solution for Exercise M4.03 * β Quiz M4.03 * Regularization in linear model * π₯ Intuitions on regularized linear models * Regularization of linear regression model * π Exercise M4.04 * π Solution for Exercise M4.04 * β Quiz M4.04 * Linear model for classification * Linear model for classification * π Exercise M4.05 * π Solution for Exercise M4.05 * Beyond linear separation in classification * β Quiz M4.05 * π Wrap-up quiz 4 * Main take-away Decision tree models * Module overview * Intuitions on tree-based models * π₯ Intuitions on tree-based models * β Quiz M5.01 * Decision tree in classification * Build a classification decision tree * π Exercise M5.01 * π Solution for Exercise M5.01 * β Quiz M5.02 * Decision tree in regression * Decision tree for regression * π Exercise M5.02 * π Solution for Exercise M5.02 * β Quiz M5.03 * Hyperparameters of decision tree * Importance of decision tree hyperparameters on generalization * β Quiz M5.04 * π Wrap-up quiz 5 * Main take-away Ensemble of models * Module overview * Ensemble method using bootstrapping * π₯ Intuitions on ensemble models: bagging * Introductory example to ensemble models * Bagging * π Exercise M6.01 * π Solution for Exercise M6.01 * Random forests * π Exercise M6.02 * π Solution for Exercise M6.02 * β Quiz M6.01 * Ensemble based on boosting * π₯ Intuitions on ensemble models: boosting * Adaptive Boosting (AdaBoost) * Gradient-boosting decision tree (GBDT) * π Exercise M6.03 * π Solution for Exercise M6.03 * Speeding-up gradient-boosting * β Quiz M6.02 * Hyperparameter tuning with ensemble methods * Hyperparameter tuning * π Exercise M6.04 * π Solution for Exercise M6.04 * β Quiz M6.03 * π Wrap-up quiz 6 * Main take-away Evaluating model performance * Module overview * Comparing a model with simple baselines * Comparing model performance with a simple baseline * π Exercise M7.01 * π Solution for Exercise M7.01 * β Quiz M7.01 * Choice of cross-validation * Stratification * Sample grouping * Non i.i.d. data * β Quiz M7.02 * Nested cross-validation * Nested cross-validation * β Quiz M7.03 * Classification metrics * Classification * π Exercise M7.02 * π Solution for Exercise M7.02 * β Quiz M7.04 * Regression metrics * Regression * π Exercise M7.03 * π Solution for Exercise M7.03 * β Quiz M7.05 * π Wrap-up quiz 7 * Main take-away Concluding remarks * π₯ Concluding remarks * Concluding remarks Appendix * Glossary * Datasets description * The penguins datasets * The adult census dataset * The California housing dataset * The Ames housing dataset * The blood transfusion dataset * The bike rides dataset * Acknowledgement * Notebook timings * Table of contents π§ Feature selection * Module overview * Benefits of using feature selection * Caveats of feature selection * π Exercise 01 * π Solution for Exercise 01 * Limitation of selecting feature using a model * Main take-away * β Quiz π§ Interpretation * Feature importance * β Quiz * Binder * Repository * Suggest edit * Open issue * .py * .pdf SET AND GET HYPERPARAMETERS IN SCIKIT-LEARN SET AND GET HYPERPARAMETERS IN SCIKIT-LEARN# The process of learning a predictive model is driven by a set of internal parameters and a set of training data. These internal parameters are called hyperparameters and are specific for each family of models. In addition, a specific set of hyperparameters are optimal for a specific dataset and thus they need to be optimized. Note In this notebook we will use the words βhyperparametersβ and βparametersβ interchangeably. This notebook shows how one can get and set the value of a hyperparameter in a scikit-learn estimator. We recall that hyperparameters refer to the parameter that will control the learning process. They should not be confused with the fitted parameters, resulting from the training. These fitted parameters are recognizable in scikit-learn because they are spelled with a final underscore _, for instance model.coef_. We will start by loading the adult census dataset and only use the numerical features. import pandas as pd adult_census = pd.read_csv("../datasets/adult-census.csv") target_name = "class" numerical_columns = ["age", "capital-gain", "capital-loss", "hours-per-week"] target = adult_census[target_name] data = adult_census[numerical_columns] Copy to clipboard Our data is only numerical. data.head() Copy to clipboard age capital-gain capital-loss hours-per-week 0 25 0 0 40 1 38 0 0 50 2 28 0 0 40 3 44 7688 0 40 4 18 0 0 30 Letβs create a simple predictive model made of a scaler followed by a logistic regression classifier. As mentioned in previous notebooks, many models, including linear ones, work better if all features have a similar scaling. For this purpose, we use a StandardScaler, which transforms the data by rescaling features. from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression model = Pipeline( steps=[ ("preprocessor", StandardScaler()), ("classifier", LogisticRegression()), ] ) Copy to clipboard We can evaluate the generalization performance of the model via cross-validation. from sklearn.model_selection import cross_validate cv_results = cross_validate(model, data, target) scores = cv_results["test_score"] print( "Accuracy score via cross-validation:\n" f"{scores.mean():.3f} Β± {scores.std():.3f}" ) Copy to clipboard Accuracy score via cross-validation: 0.800 Β± 0.003 Copy to clipboard We created a model with the default C value that is equal to 1. If we wanted to use a different C parameter we could have done so when we created the LogisticRegression object with something like LogisticRegression(C=1e-3). Note For more information on the model hyperparameter C, refer to the documentation. Be aware that we will focus on linear models in an upcoming module. We can also change the parameter of a model after it has been created with the set_params method, which is available for all scikit-learn estimators. For example, we can set C=1e-3, fit and evaluate the model: model.set_params(classifier__C=1e-3) cv_results = cross_validate(model, data, target) scores = cv_results["test_score"] print( "Accuracy score via cross-validation:\n" f"{scores.mean():.3f} Β± {scores.std():.3f}" ) Copy to clipboard Accuracy score via cross-validation: 0.787 Β± 0.002 Copy to clipboard When the model of interest is a Pipeline, the parameter names are of the form <model_name>__<parameter_name> (note the double underscore in the middle). In our case, classifier comes from the Pipeline definition and C is the parameter name of LogisticRegression. In general, you can use the get_params method on scikit-learn models to list all the parameters with their values. For example, if you want to get all the parameter names, you can use: for parameter in model.get_params(): print(parameter) Copy to clipboard memory steps verbose preprocessor classifier preprocessor__copy preprocessor__with_mean preprocessor__with_std classifier__C classifier__class_weight classifier__dual classifier__fit_intercept classifier__intercept_scaling classifier__l1_ratio classifier__max_iter classifier__multi_class classifier__n_jobs classifier__penalty classifier__random_state classifier__solver classifier__tol classifier__verbose classifier__warm_start Copy to clipboard .get_params() returns a dict whose keys are the parameter names and whose values are the parameter values. If you want to get the value of a single parameter, for example classifier__C, you can use: model.get_params()["classifier__C"] Copy to clipboard 0.001 Copy to clipboard We can systematically vary the value of C to see if there is an optimal value. for C in [1e-3, 1e-2, 1e-1, 1, 10]: model.set_params(classifier__C=C) cv_results = cross_validate(model, data, target) scores = cv_results["test_score"] print( f"Accuracy score via cross-validation with C={C}:\n" f"{scores.mean():.3f} Β± {scores.std():.3f}" ) Copy to clipboard Accuracy score via cross-validation with C=0.001: 0.787 Β± 0.002 Copy to clipboard Accuracy score via cross-validation with C=0.01: 0.799 Β± 0.003 Copy to clipboard Accuracy score via cross-validation with C=0.1: 0.800 Β± 0.003 Copy to clipboard Accuracy score via cross-validation with C=1: 0.800 Β± 0.003 Copy to clipboard Accuracy score via cross-validation with C=10: 0.800 Β± 0.003 Copy to clipboard We can see that as long as C is high enough, the model seems to perform well. What we did here is very manual: it involves scanning the values for C and picking the best one manually. In the next lesson, we will see how to do this automatically. Warning When we evaluate a family of models on test data and pick the best performer, we can not trust the corresponding prediction accuracy, and we need to apply the selected model to new data. Indeed, the test data has been used to select the model, and it is thus no longer independent from this model. In this notebook we have seen: * how to use get_params and set_params to get the parameters of a model and set them. previous Manual tuning next π Exercise M3.01 By scikit-learn developers Β© Copyright 2022. Join the full MOOC for better learning! Brought to you under a CC-BY License by Inria Learning Lab, scikit-learn @ La Fondation Inria, Inria Academy, with many thanks to the scikit-learn community as a whole!