d2l.ai
Open in
urlscan Pro
108.138.7.90
Public Scan
URL:
http://d2l.ai/
Submission: On October 17 via api from US — Scanned from DE
Submission: On October 17 via api from US — Scanned from DE
Form analysis
1 forms found in the DOMGET search.html
<form class="form-inline pull-sm-right" action="search.html" method="get">
<div class="mdl-textfield mdl-js-textfield mdl-textfield--expandable mdl-textfield--floating-label mdl-textfield--align-right has-placeholder is-upgraded" data-upgraded=",MaterialTextfield">
<label id="quick-search-icon" class="mdl-button mdl-js-button mdl-button--icon" for="waterfall-exp" data-upgraded=",MaterialButton" tabindex="0">
<i class="material-icons">search</i>
</label>
<div class="mdl-textfield__expandable-holder">
<input class="mdl-textfield__input" type="text" name="q" id="waterfall-exp" placeholder="Search">
<input type="hidden" name="check_keywords" value="yes">
<input type="hidden" name="area" value="default">
</div>
</div>
<div class="mdl-tooltip" data-mdl-for="quick-search-icon" data-upgraded=",MaterialTooltip"> Quick search </div>
</form>
Text Content
Dive into Deep Learning search Quick search code Show Source Preview Version PyTorch MXNet Notebooks Courses GitHub 中文版 Table Of Contents * Preface * Installation * Notation * 1. Introduction * 2. Preliminarieskeyboard_arrow_down * 2.1. Data Manipulation * 2.2. Data Preprocessing * 2.3. Linear Algebra * 2.4. Calculus * 2.5. Automatic Differentiation * 2.6. Probability and Statistics * 2.7. Documentation * 3. Linear Neural Networks for Regressionkeyboard_arrow_down * 3.1. Linear Regression * 3.2. Object-Oriented Design for Implementation * 3.3. Synthetic Regression Data * 3.4. Linear Regression Implementation from Scratch * 3.5. Concise Implementation of Linear Regression * 3.6. Generalization * 3.7. Weight Decay * 4. Linear Neural Networks for Classificationkeyboard_arrow_down * 4.1. Softmax Regression * 4.2. The Image Classification Dataset * 4.3. The Base Classification Model * 4.4. Softmax Regression Implementation from Scratch * 4.5. Concise Implementation of Softmax Regression * 4.6. Generalization in Classification * 4.7. Environment and Distribution Shift * 5. Multilayer Perceptronskeyboard_arrow_down * 5.1. Multilayer Perceptrons * 5.2. Implementation of Multilayer Perceptrons * 5.3. Forward Propagation, Backward Propagation, and Computational Graphs * 5.4. Numerical Stability and Initialization * 5.5. Generalization in Deep Learning * 5.6. Dropout * 5.7. Predicting House Prices on Kaggle * 6. Builders’ Guidekeyboard_arrow_down * 6.1. Layers and Modules * 6.2. Parameter Management * 6.3. Parameter Initialization * 6.4. Lazy Initialization * 6.5. Custom Layers * 6.6. File I/O * 6.7. GPUs * 7. Convolutional Neural Networkskeyboard_arrow_down * 7.1. From Fully Connected Layers to Convolutions * 7.2. Convolutions for Images * 7.3. Padding and Stride * 7.4. Multiple Input and Multiple Output Channels * 7.5. Pooling * 7.6. Convolutional Neural Networks (LeNet) * 8. Modern Convolutional Neural Networkskeyboard_arrow_down * 8.1. Deep Convolutional Neural Networks (AlexNet) * 8.2. Networks Using Blocks (VGG) * 8.3. Network in Network (NiN) * 8.4. Multi-Branch Networks (GoogLeNet) * 8.5. Batch Normalization * 8.6. Residual Networks (ResNet) and ResNeXt * 8.7. Densely Connected Networks (DenseNet) * 8.8. Designing Convolution Network Architectures * 9. Recurrent Neural Networkskeyboard_arrow_down * 9.1. Working with Sequences * 9.2. Converting Raw Text into Sequence Data * 9.3. Language Models * 9.4. Recurrent Neural Networks * 9.5. Recurrent Neural Network Implementation from Scratch * 9.6. Concise Implementation of Recurrent Neural Networks * 9.7. Backpropagation Through Time * 10. Modern Recurrent Neural Networkskeyboard_arrow_down * 10.1. Long Short-Term Memory (LSTM) * 10.2. Gated Recurrent Units (GRU) * 10.3. Deep Recurrent Neural Networks * 10.4. Bidirectional Recurrent Neural Networks * 10.5. Machine Translation and the Dataset * 10.6. The Encoder–Decoder Architecture * 10.7. Sequence-to-Sequence Learning for Machine Translation * 10.8. Beam Search * 11. Attention Mechanisms and Transformerskeyboard_arrow_down * 11.1. Queries, Keys, and Values * 11.2. Attention Pooling by Similarity * 11.3. Attention Scoring Functions * 11.4. The Bahdanau Attention Mechanism * 11.5. Multi-Head Attention * 11.6. Self-Attention and Positional Encoding * 11.7. The Transformer Architecture * 11.8. Transformers for Vision * 11.9. Large-Scale Pretraining with Transformers * 12. Optimization Algorithmskeyboard_arrow_down * 12.1. Optimization and Deep Learning * 12.2. Convexity * 12.3. Gradient Descent * 12.4. Stochastic Gradient Descent * 12.5. Minibatch Stochastic Gradient Descent * 12.6. Momentum * 12.7. Adagrad * 12.8. RMSProp * 12.9. Adadelta * 12.10. Adam * 12.11. Learning Rate Scheduling * 13. Computational Performancekeyboard_arrow_down * 13.1. Compilers and Interpreters * 13.2. Asynchronous Computation * 13.3. Automatic Parallelism * 13.4. Hardware * 13.5. Training on Multiple GPUs * 13.6. Concise Implementation for Multiple GPUs * 13.7. Parameter Servers * 14. Computer Visionkeyboard_arrow_down * 14.1. Image Augmentation * 14.2. Fine-Tuning * 14.3. Object Detection and Bounding Boxes * 14.4. Anchor Boxes * 14.5. Multiscale Object Detection * 14.6. The Object Detection Dataset * 14.7. Single Shot Multibox Detection * 14.8. Region-based CNNs (R-CNNs) * 14.9. Semantic Segmentation and the Dataset * 14.10. Transposed Convolution * 14.11. Fully Convolutional Networks * 14.12. Neural Style Transfer * 14.13. Image Classification (CIFAR-10) on Kaggle * 14.14. Dog Breed Identification (ImageNet Dogs) on Kaggle * 15. Natural Language Processing: Pretrainingkeyboard_arrow_down * 15.1. Word Embedding (word2vec) * 15.2. Approximate Training * 15.3. The Dataset for Pretraining Word Embeddings * 15.4. Pretraining word2vec * 15.5. Word Embedding with Global Vectors (GloVe) * 15.6. Subword Embedding * 15.7. Word Similarity and Analogy * 15.8. Bidirectional Encoder Representations from Transformers (BERT) * 15.9. The Dataset for Pretraining BERT * 15.10. Pretraining BERT * 16. Natural Language Processing: Applicationskeyboard_arrow_down * 16.1. Sentiment Analysis and the Dataset * 16.2. Sentiment Analysis: Using Recurrent Neural Networks * 16.3. Sentiment Analysis: Using Convolutional Neural Networks * 16.4. Natural Language Inference and the Dataset * 16.5. Natural Language Inference: Using Attention * 16.6. Fine-Tuning BERT for Sequence-Level and Token-Level Applications * 16.7. Natural Language Inference: Fine-Tuning BERT * 17. Reinforcement Learningkeyboard_arrow_down * 17.1. Markov Decision Process (MDP) * 17.2. Value Iteration * 17.3. Q-Learning * 18. Gaussian Processeskeyboard_arrow_down * 18.1. Introduction to Gaussian Processes * 18.2. Gaussian Process Priors * 18.3. Gaussian Process Inference * 19. Hyperparameter Optimizationkeyboard_arrow_down * 19.1. What Is Hyperparameter Optimization? * 19.2. Hyperparameter Optimization API * 19.3. Asynchronous Random Search * 19.4. Multi-Fidelity Hyperparameter Optimization * 19.5. Asynchronous Successive Halving * 20. Generative Adversarial Networkskeyboard_arrow_down * 20.1. Generative Adversarial Networks * 20.2. Deep Convolutional Generative Adversarial Networks * 21. Recommender Systemskeyboard_arrow_down * 21.1. Overview of Recommender Systems * 21.2. The MovieLens Dataset * 21.3. Matrix Factorization * 21.4. AutoRec: Rating Prediction with Autoencoders * 21.5. Personalized Ranking for Recommender Systems * 21.6. Neural Collaborative Filtering for Personalized Ranking * 21.7. Sequence-Aware Recommender Systems * 21.8. Feature-Rich Recommender Systems * 21.9. Factorization Machines * 21.10. Deep Factorization Machines * 22. Appendix: Mathematics for Deep Learningkeyboard_arrow_down * 22.1. Geometry and Linear Algebraic Operations * 22.2. Eigendecompositions * 22.3. Single Variable Calculus * 22.4. Multivariable Calculus * 22.5. Integral Calculus * 22.6. Random Variables * 22.7. Maximum Likelihood * 22.8. Distributions * 22.9. Naive Bayes * 22.10. Statistics * 22.11. Information Theory * 23. Appendix: Tools for Deep Learningkeyboard_arrow_down * 23.1. Using Jupyter Notebooks * 23.2. Using Amazon SageMaker * 23.3. Using AWS EC2 Instances * 23.4. Using Google Colab * 23.5. Selecting Servers and GPUs * 23.6. Contributing to This Book * 23.7. Utility Functions and Classes * 23.8. The d2l API Document * References Table Of Contents * Preface * Installation * Notation * 1. Introduction * 2. Preliminarieskeyboard_arrow_down * 2.1. Data Manipulation * 2.2. Data Preprocessing * 2.3. Linear Algebra * 2.4. Calculus * 2.5. Automatic Differentiation * 2.6. Probability and Statistics * 2.7. Documentation * 3. Linear Neural Networks for Regressionkeyboard_arrow_down * 3.1. Linear Regression * 3.2. Object-Oriented Design for Implementation * 3.3. Synthetic Regression Data * 3.4. Linear Regression Implementation from Scratch * 3.5. Concise Implementation of Linear Regression * 3.6. Generalization * 3.7. Weight Decay * 4. Linear Neural Networks for Classificationkeyboard_arrow_down * 4.1. Softmax Regression * 4.2. The Image Classification Dataset * 4.3. The Base Classification Model * 4.4. Softmax Regression Implementation from Scratch * 4.5. Concise Implementation of Softmax Regression * 4.6. Generalization in Classification * 4.7. Environment and Distribution Shift * 5. Multilayer Perceptronskeyboard_arrow_down * 5.1. Multilayer Perceptrons * 5.2. Implementation of Multilayer Perceptrons * 5.3. Forward Propagation, Backward Propagation, and Computational Graphs * 5.4. Numerical Stability and Initialization * 5.5. Generalization in Deep Learning * 5.6. Dropout * 5.7. Predicting House Prices on Kaggle * 6. Builders’ Guidekeyboard_arrow_down * 6.1. Layers and Modules * 6.2. Parameter Management * 6.3. Parameter Initialization * 6.4. Lazy Initialization * 6.5. Custom Layers * 6.6. File I/O * 6.7. GPUs * 7. Convolutional Neural Networkskeyboard_arrow_down * 7.1. From Fully Connected Layers to Convolutions * 7.2. Convolutions for Images * 7.3. Padding and Stride * 7.4. Multiple Input and Multiple Output Channels * 7.5. Pooling * 7.6. Convolutional Neural Networks (LeNet) * 8. Modern Convolutional Neural Networkskeyboard_arrow_down * 8.1. Deep Convolutional Neural Networks (AlexNet) * 8.2. Networks Using Blocks (VGG) * 8.3. Network in Network (NiN) * 8.4. Multi-Branch Networks (GoogLeNet) * 8.5. Batch Normalization * 8.6. Residual Networks (ResNet) and ResNeXt * 8.7. Densely Connected Networks (DenseNet) * 8.8. Designing Convolution Network Architectures * 9. Recurrent Neural Networkskeyboard_arrow_down * 9.1. Working with Sequences * 9.2. Converting Raw Text into Sequence Data * 9.3. Language Models * 9.4. Recurrent Neural Networks * 9.5. Recurrent Neural Network Implementation from Scratch * 9.6. Concise Implementation of Recurrent Neural Networks * 9.7. Backpropagation Through Time * 10. Modern Recurrent Neural Networkskeyboard_arrow_down * 10.1. Long Short-Term Memory (LSTM) * 10.2. Gated Recurrent Units (GRU) * 10.3. Deep Recurrent Neural Networks * 10.4. Bidirectional Recurrent Neural Networks * 10.5. Machine Translation and the Dataset * 10.6. The Encoder–Decoder Architecture * 10.7. Sequence-to-Sequence Learning for Machine Translation * 10.8. Beam Search * 11. Attention Mechanisms and Transformerskeyboard_arrow_down * 11.1. Queries, Keys, and Values * 11.2. Attention Pooling by Similarity * 11.3. Attention Scoring Functions * 11.4. The Bahdanau Attention Mechanism * 11.5. Multi-Head Attention * 11.6. Self-Attention and Positional Encoding * 11.7. The Transformer Architecture * 11.8. Transformers for Vision * 11.9. Large-Scale Pretraining with Transformers * 12. Optimization Algorithmskeyboard_arrow_down * 12.1. Optimization and Deep Learning * 12.2. Convexity * 12.3. Gradient Descent * 12.4. Stochastic Gradient Descent * 12.5. Minibatch Stochastic Gradient Descent * 12.6. Momentum * 12.7. Adagrad * 12.8. RMSProp * 12.9. Adadelta * 12.10. Adam * 12.11. Learning Rate Scheduling * 13. Computational Performancekeyboard_arrow_down * 13.1. Compilers and Interpreters * 13.2. Asynchronous Computation * 13.3. Automatic Parallelism * 13.4. Hardware * 13.5. Training on Multiple GPUs * 13.6. Concise Implementation for Multiple GPUs * 13.7. Parameter Servers * 14. Computer Visionkeyboard_arrow_down * 14.1. Image Augmentation * 14.2. Fine-Tuning * 14.3. Object Detection and Bounding Boxes * 14.4. Anchor Boxes * 14.5. Multiscale Object Detection * 14.6. The Object Detection Dataset * 14.7. Single Shot Multibox Detection * 14.8. Region-based CNNs (R-CNNs) * 14.9. Semantic Segmentation and the Dataset * 14.10. Transposed Convolution * 14.11. Fully Convolutional Networks * 14.12. Neural Style Transfer * 14.13. Image Classification (CIFAR-10) on Kaggle * 14.14. Dog Breed Identification (ImageNet Dogs) on Kaggle * 15. Natural Language Processing: Pretrainingkeyboard_arrow_down * 15.1. Word Embedding (word2vec) * 15.2. Approximate Training * 15.3. The Dataset for Pretraining Word Embeddings * 15.4. Pretraining word2vec * 15.5. Word Embedding with Global Vectors (GloVe) * 15.6. Subword Embedding * 15.7. Word Similarity and Analogy * 15.8. Bidirectional Encoder Representations from Transformers (BERT) * 15.9. The Dataset for Pretraining BERT * 15.10. Pretraining BERT * 16. Natural Language Processing: Applicationskeyboard_arrow_down * 16.1. Sentiment Analysis and the Dataset * 16.2. Sentiment Analysis: Using Recurrent Neural Networks * 16.3. Sentiment Analysis: Using Convolutional Neural Networks * 16.4. Natural Language Inference and the Dataset * 16.5. Natural Language Inference: Using Attention * 16.6. Fine-Tuning BERT for Sequence-Level and Token-Level Applications * 16.7. Natural Language Inference: Fine-Tuning BERT * 17. Reinforcement Learningkeyboard_arrow_down * 17.1. Markov Decision Process (MDP) * 17.2. Value Iteration * 17.3. Q-Learning * 18. Gaussian Processeskeyboard_arrow_down * 18.1. Introduction to Gaussian Processes * 18.2. Gaussian Process Priors * 18.3. Gaussian Process Inference * 19. Hyperparameter Optimizationkeyboard_arrow_down * 19.1. What Is Hyperparameter Optimization? * 19.2. Hyperparameter Optimization API * 19.3. Asynchronous Random Search * 19.4. Multi-Fidelity Hyperparameter Optimization * 19.5. Asynchronous Successive Halving * 20. Generative Adversarial Networkskeyboard_arrow_down * 20.1. Generative Adversarial Networks * 20.2. Deep Convolutional Generative Adversarial Networks * 21. Recommender Systemskeyboard_arrow_down * 21.1. Overview of Recommender Systems * 21.2. The MovieLens Dataset * 21.3. Matrix Factorization * 21.4. AutoRec: Rating Prediction with Autoencoders * 21.5. Personalized Ranking for Recommender Systems * 21.6. Neural Collaborative Filtering for Personalized Ranking * 21.7. Sequence-Aware Recommender Systems * 21.8. Feature-Rich Recommender Systems * 21.9. Factorization Machines * 21.10. Deep Factorization Machines * 22. Appendix: Mathematics for Deep Learningkeyboard_arrow_down * 22.1. Geometry and Linear Algebraic Operations * 22.2. Eigendecompositions * 22.3. Single Variable Calculus * 22.4. Multivariable Calculus * 22.5. Integral Calculus * 22.6. Random Variables * 22.7. Maximum Likelihood * 22.8. Distributions * 22.9. Naive Bayes * 22.10. Statistics * 22.11. Information Theory * 23. Appendix: Tools for Deep Learningkeyboard_arrow_down * 23.1. Using Jupyter Notebooks * 23.2. Using Amazon SageMaker * 23.3. Using AWS EC2 Instances * 23.4. Using Google Colab * 23.5. Selecting Servers and GPUs * 23.6. Contributing to This Book * 23.7. Utility Functions and Classes * 23.8. The d2l API Document * References DIVE INTO DEEP LEARNING¶ DIVE INTO DEEP LEARNING Interactive deep learning book with code, math, and discussions Implemented with PyTorch, NumPy/MXNet, JAX, and TensorFlow Adopted at 500 universities from 70 countries * [Feb 2023] The book is forthcoming on Cambridge University Press (order). The Chinese version is the best seller at the largest Chinese online bookstore. Follow D2L's open-source project for the latest updates. * [Dec 2022] JAX implementation is available! New topics of reinforcement learning, Gaussian processes, and hyperparameter optimization are added! * [Jul 2022] Check out our new API for implementation and new topics like generalization in classification and deep learning, ResNeXt, CNN design space, and transformers for vision and large-scale pretraining. * [May 2022] Join us to improve ongoing translations in Portuguese, Turkish, Vietnamese, Korean, and Japanese. * [Dec 2021] We added a new option to run this book for free: check out SageMaker Studio Lab. * [May 2021] Slides, Jupyter notebooks, assignments, and videos of the Berkeley course can be found at the syllabus page. AUTHORS ASTON ZHANG Amazon ZACK C. LIPTON CMU and Amazon MU LI Amazon ALEX J. SMOLA Amazon VOL.2 CHAPTER AUTHORS PRATIK CHAUDHARI UPenn and Amazon Reinforcement Learning RASOOL FAKOOR Amazon Reinforcement Learning KAVOSH ASADI Amazon Reinforcement Learning ANDREW GORDON WILSON NYU and Amazon Gaussian Processes AARON KLEIN Amazon Hyperparameter Optimization MATTHIAS SEEGER Amazon Hyperparameter Optimization CEDRIC ARCHAMBEAU Amazon Hyperparameter Optimization SHUAI ZHANG Amazon Recommender Systems YI TAY Google Recommender Systems BRENT WERNESS Amazon Mathematics for Deep Learning RACHEL HU Amazon Mathematics for Deep Learning FRAMEWORK ADAPTATION AUTHORS ANIRUDH DAGAR Amazon PyTorch Adaptation JAX Adaptation YUAN TANG Akuity TensorFlow Adaptation WE THANK ALL THE COMMUNITY CONTRIBUTORS FOR MAKING THIS OPEN SOURCE BOOK BETTER FOR EVERYONE. CONTRIBUTE TO THE BOOK EACH SECTION IS AN EXECUTABLE JUPYTER NOTEBOOK You can modify the code and tune hyperparameters to get instant feedback to accumulate practical experiences in deep learning. Run locally Amazon SageMaker Studio Lab Amazon SageMaker Google Colab MATHEMATICS + FIGURES + CODE We offer an interactive learning experience with mathematics, figures, code, text, and discussions, where concepts and techniques are illustrated and implemented with experiments on real data sets. ACTIVE COMMUNITY SUPPORT You can discuss and learn with thousands of peers in the community through the link provided in each section. D2L AS A TEXTBOOK OR A REFERENCE BOOK [+] Click here to show the incomplete list. Abasyn University, Islamabad Campus Alexandria University Amirkabir University of Technology Amity University Amrita Vishwa Vidyapeetham University Anna University Anna University Regional Campus Madurai Ateneo de Naga University Australian National University Bar-Ilan University Barnard College Beijing Foresty University Birla Institute of Technology and Science, Hyderabad Birla Institute of Technology and Science, Pilani BML Munjal University Boston College Boston University Brac University Brandeis University Brown University Brunel University London Cairo University California State University, Northridge Cankaya University Carnegie Mellon University Center for Research and Advanced Studies of the National Polytechnic Institute Chalmers University of Technology Chennai Mathematical Institute Chouaib Doukkali University Chulalongkorn University City College of New York City University of Hong Kong City University of Science and Information Technology College of Engineering Pune Columbia University Cornell University Cyprus Institute Deakin University Diponegoro University Dresden University of Technology Duke University Durban University of Technology Eastern Mediterranean University Ecole Nationale Supérieure d'Informatique Ecole Nationale Supérieure de Cognitique École Nationale Supérieure de Techniques Avancées Eindhoven University of Technology Emory University Eötvös Loránd University Escuela Politécnica Nacional Escuela Superior Politecnica del Litoral Federal University Lokoja Feng Chia University Fisk University Florida Atlantic University FPT University Fudan University Ganpat University Gayatri Vidya Parishad College of Engineering (Autonomous) Gazi Üniversitesi Gdańsk University of Technology George Mason University Georgetown University Georgia Institute of Technology Gheorghe Asachi Technical University of Iaşi Golden Gate University Great Lakes Institute of Management Gwangju Institute of Science and Technology Habib University Hamad Bin Khalifa University Hangzhou Dianzi University Hangzhou Dianzi University Hankuk University of Foreign Studies Harare Institute of Technology Harbin Institute of Technology Harvard University Hasso-Plattner-Institut Hebrew University of Jerusalem Heinrich-Heine-Universität Düsseldorf Henan Institute of Technology Hertie School Higher Institute of Applied Science and Technology of Sousse Hiroshima University Ho Chi Minh City University of Foreign Languages and Information Technology Hochschule Bremen Hochschule für Technik und Wirtschaft Hochschule Hamm-Lippstadt Hong Kong University of Science and Technology Houston Community College Huazhong University of Science and Technology Humboldt-Universität zu Berlin İbn Haldun Üniversitesi Icahn School of Medicine at Mount Sinai Imperial College London IMT Mines Alès Indian Institute of Technology Bombay Indian Institute of Technology Hyderabad Indian Institute of Technology Jodhpur Indian Institute of Technology Kanpur Indian Institute of Technology Kharagpur Indian Institute of Technology Mandi Indian Institute of Technology Ropar Indian School of Business Indira Gandhi National Open University Indraprastha Institute of Information Technology, Delhi Institut catholique d'arts et métiers (ICAM) Institut de recherche en informatique de Toulouse Institut Supérieur d'Informatique et des Techniques de Communication Institut Supérieur De L'electronique Et Du Numérique Institut Teknologi Bandung Instituto Federal de Educação, Ciência e Tecnologia de São Paulo, Campus Salto Instituto Politécnico Nacional Instituto Tecnológico Autónomo de México Instituto Tecnológico de Buenos Aires Islamic University of Medina İstanbul Teknik Üniversitesi IT-Universitetet i København Ivan Franko National University of Lviv Jeonbuk National Univerity Johns Hopkins University Julius-Maximilians-Universität Würzburg Keio University King Abdullah University of Science and Technology King Fahd University of Petroleum and Minerals King Faisal University Kongu Engineering College Korea Aerospace University KPR Institute of Engineering and Technology Kyungpook National University Lancaster University Leading Unviersity Leibniz Universität Hannover Leuphana University of Lüneburg London School of Economics & Political Science M.S.Ramaiah University of Applied Sciences Make School Masaryk University Massachusetts Institute of Technology Maynooth University McGill University Menoufia University Milwaukee School of Engineering Minia University Mississippi State University Missouri University of Science and Technology Mohammad Ali Jinnah University Mohammed V University in Rabat Monash University Multimedia University Murdoch University Nanjing University Nanchang Hangkong University Nanjing Medical University Nanjing University National Chung Hsing University National Institute of Technical Teachers Training & Research National Institute of Technology Trichy National Institute of Technology, Warangal National Sun Yat-sen University National Taichung University of Science and Technology National Taiwan University National Technical University of Athens National Technical University of Ukraine National United University National University of Sciences and Technology National University of Singapore Nazarbayev University New Jersey Institute of Technology New Mexico Institute of Mining and Technology New Mexico State University New York University Newman University North Ossetian State University NorthCap University Northeastern University Northwestern Polytechnical University Northwestern University Ohio University Pakuan University Peking University Pennsylvania State University Pohang University of Science and Technology Politechnika Białostocka Politecnico di Milano Politeknik Negeri Semarang Pomona College Pontificia Universidad Católica de Chile Pontificia Universidad Católica del Perú Portland State University Punjabi University Purdue University Purdue University Northwest Quaid-e-Azam University Queen Mary University of London Queen's University Radboud Universiteit Radboud University Rajiv Gandhi Institute of Petroleum Technology Rensselaer Polytechnic Institute Rowan University Rutgers, The State University of New Jersey RVS Institute of Management Studies and Research RWTH Aachen University Sant Longowal Institute of Engineering Technology Santa Clara University Sapienza Università di Roma Seoul National University Seoul National University of Science and Technology Shanghai Jiao Tong University Shanghai University of Electric Power Shanghai University of Finance and Economics Shantilal Shah Engineering College Sharif University of Technology Shenzhen University Shivaji University, Kolhapur Simon Fraser University Singapore University of Technology and Design Sogang University Sookmyung Women's University Southern Connecticut State University Southern New Hampshire University St. Pölten University of Applied Sciences Stanford University State University of New York at Albany State University of New York at Binghamton State University of New York at Fredonia Stellenbosch University Stevens Institute of Technology Sungkyunkwan University Technion - Israel Institute of Technology Technische Universität Berlin Technische Universität München Technische Universiteit Delft Tecnológico de Monterrey, Campus Guadalajara Tekirdağ Namık Kemal Üniversitesi Télécom Paris Telkom University Texas A&M University Thapar Institute of Engineering and Technology Tsinghua University Tufts University Umeå University Universidad Carlos III de Madrid Universidad de Ibagué Universidad de Ingeniería y Tecnología - UTEC Universidad de Salamanca Universidad de Zaragoza Universidad del Norte, Colombia Universidad Icesi Universidad Militar Nueva Granada Universidad Nacional Agraria La Molina Universidad Nacional Autónoma de México Universidad Nacional de Colombia Sede Manizales Universidad Nacional de Tierra del Fuego Universidad Politécnica de Chiapas Universidad Politécnica de Valencia Universidad Politécnica Salesiana, Cuenca Universidad Rafael Landivar Universidad Rey Juan Carlos Universidad San Francisco de Quito Universidad Tecnológica de Pereira Universidad Tecnológica Nacional Universidade Católica de Brasília Universidade Estadual de Campinas Universidade Federal de Goiás Universidade Federal de Minas Gerais Universidade Federal de Ouro Preto Universidade Federal de Pernambuco Universidade Federal de São Carlos Universidade Federal de Viçosa Universidade Federal do Pampa Universidade Federal do Rio Grande Universidade NOVA de Lisboa Universidade Presbiteriana Mackenzie Universidade Tecnológica Federal do Paraná Università Cattolica del Sacro Cuore Università degli Studi di Bari Aldo Moro Università degli Studi di Brescia Università degli Studi di Catania Università degli Studi di Padova Universitas Andalas, Padang Universitas Indonesia Universitas Negeri Yogyakarta Universitas Udayana Universität Bremen Universitat de Barcelona Universitat de València Universität Heidelberg Universität Leipzig Universitat Politècnica de Catalunya Universitatea Babeș-Bolyai Universitatea de Vest din Timișoara Université Abderrahmane Mira de Béjaïa Université Clermont Auvergne Université Côte d'Azur Université de Caen Normandie Université de Rouen Normandie Université de technologie de Compiègne Université Paris-Saclay Université Toulouse 1 Capitole University of Akron University of Alabama in Huntsville University of Allahabad University of Applied Sciences Würzburg-Schweinfurt University of Arkansas University of Augsburg University of Baghdad University of Bath University of Bordj Bou Arreridj University of British Columbia University of California, Berkeley University of California, Irvine University of California, Los Angeles University of California, San Diego University of California, Santa Barbara University of California, Santa Cruz University of Cambridge University of Canberra University of Catania University of Cincinnati University of Colorado Boulder University of Connecticut University of Copenhagen University of Derby University of Florida University of Genoa University of Ghana University of Groningen University of Hamburg University of Houston University of Hull University of Iceland University of Idaho University of Illinois at Urbana-Champaign University of International Business and Economics University of Klagenfurt University of Liège University of Louisiana at Lafayette University of Maryland University of Maryland Baltimore County University of Massachusetts Lowell University of Michigan University of Michigan Dearborn University of Milano-Bicocca University of Minnesota, Twin Cities University of Moratuwa University of Nebraska Omaha University of New Hampshire University of Newcastle University of North Carolina at Chapel Hill University of North Texas University of Northern Philippines University of Nottingham University of Oslo University of Pennsylvania University of Pittsburgh University of Rostock University of São Paulo University of Science and Technology of China University of Southern California University of Southern Maine University of St Andrews University of St. Thomas University of Suffolk University of Sydney University of Szeged University of Technology Sydney University of Tehran University of Texas at Austin University of Texas at Dallas University of Texas Rio Grande Valley University of Udine University of Warsaw University of Washington University of Waterloo University of Wisconsin Madison Univerzita Komenského v Bratislave Uniwersytet Jagielloński Vardhaman College of Engineering Vardhman Mahaveer Open University Vietnamese-German University Vignana Jyothi Institute Of Management Vilnius University Wageningen University West Virginia University Western University Wichita State University Xavier University Bhubaneswar Xi'an Jiaotong Liverpool University Xiamen University Xianning Vocational Technical College Yale University Yeshiva University Yıldız Teknik Üniversitesi Yonsei University Yunnan University Zhejiang University BIBTEX ENTRY FOR CITING THE BOOK @book{zhang2023dive, title={Dive into Deep Learning}, author={Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J.}, publisher={Cambridge University Press}, note={\url{https://D2L.ai}}, year={2023} } TABLE OF CONTENTS * Preface * Installation * Notation * 1. Introduction * 1.1. A Motivating Example * 1.2. Key Components * 1.3. Kinds of Machine Learning Problems * 1.4. Roots * 1.5. The Road to Deep Learning * 1.6. Success Stories * 1.7. The Essence of Deep Learning * 1.8. Summary * 1.9. Exercises * 2. Preliminaries * 2.1. Data Manipulation * 2.2. Data Preprocessing * 2.3. Linear Algebra * 2.4. Calculus * 2.5. Automatic Differentiation * 2.6. Probability and Statistics * 2.7. Documentation * 3. Linear Neural Networks for Regression * 3.1. Linear Regression * 3.2. Object-Oriented Design for Implementation * 3.3. Synthetic Regression Data * 3.4. Linear Regression Implementation from Scratch * 3.5. Concise Implementation of Linear Regression * 3.6. Generalization * 3.7. Weight Decay * 4. Linear Neural Networks for Classification * 4.1. Softmax Regression * 4.2. The Image Classification Dataset * 4.3. The Base Classification Model * 4.4. Softmax Regression Implementation from Scratch * 4.5. Concise Implementation of Softmax Regression * 4.6. Generalization in Classification * 4.7. Environment and Distribution Shift * 5. Multilayer Perceptrons * 5.1. Multilayer Perceptrons * 5.2. Implementation of Multilayer Perceptrons * 5.3. Forward Propagation, Backward Propagation, and Computational Graphs * 5.4. Numerical Stability and Initialization * 5.5. Generalization in Deep Learning * 5.6. Dropout * 5.7. Predicting House Prices on Kaggle * 6. Builders’ Guide * 6.1. Layers and Modules * 6.2. Parameter Management * 6.3. Parameter Initialization * 6.4. Lazy Initialization * 6.5. Custom Layers * 6.6. File I/O * 6.7. GPUs * 7. Convolutional Neural Networks * 7.1. From Fully Connected Layers to Convolutions * 7.2. Convolutions for Images * 7.3. Padding and Stride * 7.4. Multiple Input and Multiple Output Channels * 7.5. Pooling * 7.6. Convolutional Neural Networks (LeNet) * 8. Modern Convolutional Neural Networks * 8.1. Deep Convolutional Neural Networks (AlexNet) * 8.2. Networks Using Blocks (VGG) * 8.3. Network in Network (NiN) * 8.4. Multi-Branch Networks (GoogLeNet) * 8.5. Batch Normalization * 8.6. Residual Networks (ResNet) and ResNeXt * 8.7. Densely Connected Networks (DenseNet) * 8.8. Designing Convolution Network Architectures * 9. Recurrent Neural Networks * 9.1. Working with Sequences * 9.2. Converting Raw Text into Sequence Data * 9.3. Language Models * 9.4. Recurrent Neural Networks * 9.5. Recurrent Neural Network Implementation from Scratch * 9.6. Concise Implementation of Recurrent Neural Networks * 9.7. Backpropagation Through Time * 10. Modern Recurrent Neural Networks * 10.1. Long Short-Term Memory (LSTM) * 10.2. Gated Recurrent Units (GRU) * 10.3. Deep Recurrent Neural Networks * 10.4. Bidirectional Recurrent Neural Networks * 10.5. Machine Translation and the Dataset * 10.6. The Encoder–Decoder Architecture * 10.7. Sequence-to-Sequence Learning for Machine Translation * 10.8. Beam Search * 11. Attention Mechanisms and Transformers * 11.1. Queries, Keys, and Values * 11.2. Attention Pooling by Similarity * 11.3. Attention Scoring Functions * 11.4. The Bahdanau Attention Mechanism * 11.5. Multi-Head Attention * 11.6. Self-Attention and Positional Encoding * 11.7. The Transformer Architecture * 11.8. Transformers for Vision * 11.9. Large-Scale Pretraining with Transformers * 12. Optimization Algorithms * 12.1. Optimization and Deep Learning * 12.2. Convexity * 12.3. Gradient Descent * 12.4. Stochastic Gradient Descent * 12.5. Minibatch Stochastic Gradient Descent * 12.6. Momentum * 12.7. Adagrad * 12.8. RMSProp * 12.9. Adadelta * 12.10. Adam * 12.11. Learning Rate Scheduling * 13. Computational Performance * 13.1. Compilers and Interpreters * 13.2. Asynchronous Computation * 13.3. Automatic Parallelism * 13.4. Hardware * 13.5. Training on Multiple GPUs * 13.6. Concise Implementation for Multiple GPUs * 13.7. Parameter Servers * 14. Computer Vision * 14.1. Image Augmentation * 14.2. Fine-Tuning * 14.3. Object Detection and Bounding Boxes * 14.4. Anchor Boxes * 14.5. Multiscale Object Detection * 14.6. The Object Detection Dataset * 14.7. Single Shot Multibox Detection * 14.8. Region-based CNNs (R-CNNs) * 14.9. Semantic Segmentation and the Dataset * 14.10. Transposed Convolution * 14.11. Fully Convolutional Networks * 14.12. Neural Style Transfer * 14.13. Image Classification (CIFAR-10) on Kaggle * 14.14. Dog Breed Identification (ImageNet Dogs) on Kaggle * 15. Natural Language Processing: Pretraining * 15.1. Word Embedding (word2vec) * 15.2. Approximate Training * 15.3. The Dataset for Pretraining Word Embeddings * 15.4. Pretraining word2vec * 15.5. Word Embedding with Global Vectors (GloVe) * 15.6. Subword Embedding * 15.7. Word Similarity and Analogy * 15.8. Bidirectional Encoder Representations from Transformers (BERT) * 15.9. The Dataset for Pretraining BERT * 15.10. Pretraining BERT * 16. Natural Language Processing: Applications * 16.1. Sentiment Analysis and the Dataset * 16.2. Sentiment Analysis: Using Recurrent Neural Networks * 16.3. Sentiment Analysis: Using Convolutional Neural Networks * 16.4. Natural Language Inference and the Dataset * 16.5. Natural Language Inference: Using Attention * 16.6. Fine-Tuning BERT for Sequence-Level and Token-Level Applications * 16.7. Natural Language Inference: Fine-Tuning BERT * 17. Reinforcement Learning * 17.1. Markov Decision Process (MDP) * 17.2. Value Iteration * 17.3. Q-Learning * 18. Gaussian Processes * 18.1. Introduction to Gaussian Processes * 18.2. Gaussian Process Priors * 18.3. Gaussian Process Inference * 19. Hyperparameter Optimization * 19.1. What Is Hyperparameter Optimization? * 19.2. Hyperparameter Optimization API * 19.3. Asynchronous Random Search * 19.4. Multi-Fidelity Hyperparameter Optimization * 19.5. Asynchronous Successive Halving * 20. Generative Adversarial Networks * 20.1. Generative Adversarial Networks * 20.2. Deep Convolutional Generative Adversarial Networks * 21. Recommender Systems * 21.1. Overview of Recommender Systems * 21.2. The MovieLens Dataset * 21.3. Matrix Factorization * 21.4. AutoRec: Rating Prediction with Autoencoders * 21.5. Personalized Ranking for Recommender Systems * 21.6. Neural Collaborative Filtering for Personalized Ranking * 21.7. Sequence-Aware Recommender Systems * 21.8. Feature-Rich Recommender Systems * 21.9. Factorization Machines * 21.10. Deep Factorization Machines * 22. Appendix: Mathematics for Deep Learning * 22.1. Geometry and Linear Algebraic Operations * 22.2. Eigendecompositions * 22.3. Single Variable Calculus * 22.4. Multivariable Calculus * 22.5. Integral Calculus * 22.6. Random Variables * 22.7. Maximum Likelihood * 22.8. Distributions * 22.9. Naive Bayes * 22.10. Statistics * 22.11. Information Theory * 23. Appendix: Tools for Deep Learning * 23.1. Using Jupyter Notebooks * 23.2. Using Amazon SageMaker * 23.3. Using AWS EC2 Instances * 23.4. Using Google Colab * 23.5. Selecting Servers and GPUs * 23.6. Contributing to This Book * 23.7. Utility Functions and Classes * 23.8. The d2l API Document * References Next Preface