mlsafety-course.pages.dev Open in urlscan Pro
188.114.97.9  Public Scan

URL: https://mlsafety-course.pages.dev/
Submission: On July 13 via automatic, source certstream-suspicious — Scanned from NL

Form analysis 0 forms found in the DOM

Text Content

Link Search Menu Expand Document
Intro to ML Safety
 * About
 * Readings
 * Schedule

This site uses Just the Docs, a documentation theme for Jekyll.
 * ML Safety Community


SCHEDULE

Legend: 🎥 lecture recording, 🖥️ slides, 📖 notes, 📝 written questions, ⌨️
coding assignment.
Apply to participate in the course program by Jan 29th to have your assignments
graded and participate in discussions and speaker events.


BACKGROUND

1Introduction🎥, 🖥️️2Optional Deep Learning Review🎥, 🖥️, 📖, 📝, ⌨️building
blocks, optimizers, losses, datasets


SAFETY ENGINEERING

3Risk Decomposition🎥, 🖥️️risk analysis definitions, disaster risk equation,
decomposition of safety areas, ability to cope and existential risk4Accident
Models🎥, 🖥️FMEA, Bow Tie model, Swiss Cheese model, defense in depth,
preventative and protective measures, complex systems, nonlinear causality,
emergence, STAMP5Black Swans🎥, 🖥️unknown unknowns, long tailed distributions,
multiplicative processes, extremistan►Review questions 📝


ROBUSTNESS

6Adversarial Robustness🎥, 🖥️, 📖, ⌨️optimization pressure, PGD, untargeted vs
targeted attacks, adversarial evaluation, white box vs black box,
transferability, unforeseen attacks, text attacks, robustness certificates7Black
Swan Robustness🎥, 🖥️️, 📖stress tests, train-test mismatch, adversarial
distribution shifts, simulated scenarios for robustness8Review questions 📝


MONITORING

8Anomaly Detection🎥, 🖥️️, 📖, ⌨️AUROC/AUPR/FPR95, likelihoods and detection,
MSP baseline, OE, ViM, anomaly datasets, one-class learning, detecting
adversaries, error detection9Interpretable Uncertainty🎥, 🖥️, 📖calibration vs
sharpness, proper scoring rules, Brier score, RMS calibration error, reliability
diagrams, confidence intervals, quantile prediction10Transparency🎥, 🖥️saliency
maps, token heatmaps, feature visualizations, ProtoPNet11Trojans🎥, 🖥️, 📖,
⌨️hidden functionality from poisoning, treacherous turns12Detecting Emergent
Behavior🎥, 🖥️, 📖emergent capabilities, instrumental convergence, Goodhart’s
law, proxy gaming13Review questions 📝


ALIGNMENT

13Honest Models🎥, 🖥️truthful vs. honest, inverse scaling, instances of model
dishonesty14Power Aversion🖥️TBC early 2023; social, economic, and governmental
formalizations of power bases; power penalties15Machine Ethics🎥, 🖥️,
⌨️normative ethics background, human values, value learning with comparisons,
translating moral knowledge into action, moral parliament, value clarification


SYSTEMIC SAFETY

16ML for Improved Decision-Making🎥, 🖥️, 📖forecasting, brainstorming17ML for
Cyberdefense🎥, 🖥️intrusion detection, detecting malicious programs, automated
patching, fuzzing18Cooperative AI🎥, 🖥️, 📖nash equilibria, dominant
strategies, stag hunt, Pareto improvements, cooperation mechanisms, morality as
cooperation, cooperative dispositions, collusion externalities


ADDITIONAL EXISTENTIAL RISK DISCUSSION

19X-Risk Overview🎥, 🖥️arguments for x-risk20Possible Existential Hazards🎥,
🖥️weaponization, proxy gaming, treacherous turn, deceptive alignment, value
lock-in, persuasive AI21Safety-Capabilities Balance🎥, 🖥️theories of impact,
differential technological progress, capabilities externalities22Natural
Selection Favors AIs over Humans🎥, 🖥️Lewontin’s conditions, multiple AI
agents, generalized Darwinism, mechanisms for cooperation23Review and
Conclusion🎥, 🖥️, 📝pillars of ML safety research, task-train-deploy pipeline

--------------------------------------------------------------------------------

Copyright © 2023. Created by Dan Hendrycks at the Center for AI Safety