mlsafety-course.pages.dev
Open in
urlscan Pro
188.114.97.9
Public Scan
URL:
https://mlsafety-course.pages.dev/
Submission: On July 13 via automatic, source certstream-suspicious — Scanned from NL
Submission: On July 13 via automatic, source certstream-suspicious — Scanned from NL
Form analysis
0 forms found in the DOMText Content
Link Search Menu Expand Document Intro to ML Safety * About * Readings * Schedule This site uses Just the Docs, a documentation theme for Jekyll. * ML Safety Community SCHEDULE Legend: 🎥 lecture recording, 🖥️ slides, 📖 notes, 📝 written questions, ⌨️ coding assignment. Apply to participate in the course program by Jan 29th to have your assignments graded and participate in discussions and speaker events. BACKGROUND 1Introduction🎥, 🖥️️2Optional Deep Learning Review🎥, 🖥️, 📖, 📝, ⌨️building blocks, optimizers, losses, datasets SAFETY ENGINEERING 3Risk Decomposition🎥, 🖥️️risk analysis definitions, disaster risk equation, decomposition of safety areas, ability to cope and existential risk4Accident Models🎥, 🖥️FMEA, Bow Tie model, Swiss Cheese model, defense in depth, preventative and protective measures, complex systems, nonlinear causality, emergence, STAMP5Black Swans🎥, 🖥️unknown unknowns, long tailed distributions, multiplicative processes, extremistan►Review questions 📝 ROBUSTNESS 6Adversarial Robustness🎥, 🖥️, 📖, ⌨️optimization pressure, PGD, untargeted vs targeted attacks, adversarial evaluation, white box vs black box, transferability, unforeseen attacks, text attacks, robustness certificates7Black Swan Robustness🎥, 🖥️️, 📖stress tests, train-test mismatch, adversarial distribution shifts, simulated scenarios for robustness8Review questions 📝 MONITORING 8Anomaly Detection🎥, 🖥️️, 📖, ⌨️AUROC/AUPR/FPR95, likelihoods and detection, MSP baseline, OE, ViM, anomaly datasets, one-class learning, detecting adversaries, error detection9Interpretable Uncertainty🎥, 🖥️, 📖calibration vs sharpness, proper scoring rules, Brier score, RMS calibration error, reliability diagrams, confidence intervals, quantile prediction10Transparency🎥, 🖥️saliency maps, token heatmaps, feature visualizations, ProtoPNet11Trojans🎥, 🖥️, 📖, ⌨️hidden functionality from poisoning, treacherous turns12Detecting Emergent Behavior🎥, 🖥️, 📖emergent capabilities, instrumental convergence, Goodhart’s law, proxy gaming13Review questions 📝 ALIGNMENT 13Honest Models🎥, 🖥️truthful vs. honest, inverse scaling, instances of model dishonesty14Power Aversion🖥️TBC early 2023; social, economic, and governmental formalizations of power bases; power penalties15Machine Ethics🎥, 🖥️, ⌨️normative ethics background, human values, value learning with comparisons, translating moral knowledge into action, moral parliament, value clarification SYSTEMIC SAFETY 16ML for Improved Decision-Making🎥, 🖥️, 📖forecasting, brainstorming17ML for Cyberdefense🎥, 🖥️intrusion detection, detecting malicious programs, automated patching, fuzzing18Cooperative AI🎥, 🖥️, 📖nash equilibria, dominant strategies, stag hunt, Pareto improvements, cooperation mechanisms, morality as cooperation, cooperative dispositions, collusion externalities ADDITIONAL EXISTENTIAL RISK DISCUSSION 19X-Risk Overview🎥, 🖥️arguments for x-risk20Possible Existential Hazards🎥, 🖥️weaponization, proxy gaming, treacherous turn, deceptive alignment, value lock-in, persuasive AI21Safety-Capabilities Balance🎥, 🖥️theories of impact, differential technological progress, capabilities externalities22Natural Selection Favors AIs over Humans🎥, 🖥️Lewontin’s conditions, multiple AI agents, generalized Darwinism, mechanisms for cooperation23Review and Conclusion🎥, 🖥️, 📝pillars of ML safety research, task-train-deploy pipeline -------------------------------------------------------------------------------- Copyright © 2023. Created by Dan Hendrycks at the Center for AI Safety