ai-plans.com
Open in
urlscan Pro
143.110.165.10
Public Scan
Submitted URL: http://ai-plans.com/
Effective URL: https://ai-plans.com/
Submission: On October 18 via api from US — Scanned from GB
Effective URL: https://ai-plans.com/
Submission: On October 18 via api from US — Scanned from GB
Form analysis
0 forms found in the DOMText Content
{"PUBLIC_ROOT":"","POST_CHAR_LIMIT":50000,"CONFIRM_MINUTES":15,"UPLOAD_LIMIT_MB":8,"UPLOAD_LIMIT_MB_PDF":5,"UPLOAD_SEC_LIMIT":15,"CHAT_LENGTH":500,"POST_BUFFER_MS":60000,"COMMENT_BUFFER_MS":30000,"POST_LIMITS":{"TITLE":200,"DESCRIPTION":1200,"CONTENT":500000,"ATTRIBUTION":250,"COMMENT_CONTENT":10000},"VOTE_TYPES":{"single_up":1},"UPLOAD_BUFFER_S":10,"UPLOAD_LIMIT_GENERIC_MB":1,"HOLD_UNLOGGED_SUBMIT_DAYS":1,"KARMA_SCALAR":0.01} Created by potrace 1.16, written by Peter Selinger 2001-2019 ai-plans ☰ home contact Submit a Plan Log In Submit a Plan Log In SUBMIT, CRITIQUE AND RANK AI ALIGNMENT PLANS PLANS CURRENTLY RANKED BY: ∑STRENGTHS - ∑VULNERABILITIES Submit a Plan TOPICS: All Ethics Interpretability Oversight Philosophy Governance Value Learning Inverse Reinforcement Learning Corrigibility Cooperative Inverse RL Reward Modelling Safe Exploration Adversarial Training 1 REACT: OUT-OF-DISTRIBUTION DETECTION WITH RECTIFIED ACTIVATIONS attributed to: Yiyou Sun, Chuan Guo, Yixuan Li posted by: KabirKumar Out-of-distribution (OOD) detection has received much attention lately due to its practical importance in enha... Out-of-distribution (OOD) detection has received much attention lately due to its practical importance in enhancing the safe deployment of neural networks. One of the primary challenges is that models often produce highly confident predictions on OOD data, which undermines the driving principle in OOD detection that the model should only be confident about in-distribution samples. In this work, we propose ReAct--a simple and effective technique for reducing model overconfidence on OOD data... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 5 Add : 2 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 2 LEARNING SAFE POLICIES WITH EXPERT GUIDANCE attributed to: Jessie Huang, Fa Wu, Doina Precup, Yang Cai posted by: KabirKumar We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function m... We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify. In order to do this, we rely on the existence of demonstrations from expert policies, and we provide a theoretical framework for the agent to optimize in the space of rewards consistent with its existing knowledge. We propose two methods to solve the resulting optimization: an exact ellipsoid-based method and a method in the spirit of the "follow-the-perturbed-leader" algorithm. Our experiments demonstrate the behavior of our algorithm in both discrete and continuous problems... ...read full abstract close show post : 1 Add : 0 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 3 "CAUSAL SCRUBBING: A METHOD FOR RIGOROUSLY TESTING INTERPRETABILITY HYPOTHESES", AI ALIGNMENT FORUM, 2022. attributed to: Lawrence Chan, Adrià Garriga-Alonso, Nicholas Goldowsky-Dill, Ryan Greenblatt, Jenny Nitishinskaya, Ansh Radhakrishnan, Buck Shlegeris, Nate Thomas [Redwood Research] posted by: momom2 Summary: This post introduces causal scrubbing, a principled approach for evaluating the quality of mechanisti... Summary: This post introduces causal scrubbing, a principled approach for evaluating the quality of mechanistic interpretations. The key idea behind causal scrubbing is to test interpretability hypotheses via behavior-preserving resampling ablations. We apply this method to develop a refined understanding of how a small language model implements induction and how an algorithmic model correctly classifies if a sequence of parentheses is balanced. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 4 NATURAL ABSTRACTIONS: KEY CLAIMS, THEOREMS, AND CRITIQUES attributed to: LawrenceC, Leon Lang, Erik Jenner, John Wentworth posted by: KabirKumar TL;DR: We distill John Wentworth’s Natural Abstractions agenda by summarizing its key claims: the Natural Abst... TL;DR: We distill John Wentworth’s Natural Abstractions agenda by summarizing its key claims: the Natural Abstraction Hypothesis—many cognitive systems learn to use similar abstractions—and the Redundant Information Hypothesis—a particular mathematical description of natural abstractions. We also formalize proofs for several of its theoretical results. Finally, we critique the agenda’s progress to date, alignment relevance, and current research methodology. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 5 COGNITIVE EMULATION: A NAIVE AI SAFETY PROPOSAL attributed to: Connor Leahy, Gabriel Alfour (Conjecture) posted by: KabirKumar This post serves as a signpost for Conjecture’s new primary safety proposal and research direction, which we c... This post serves as a signpost for Conjecture’s new primary safety proposal and research direction, which we call Cognitive Emulation (or “CoEm”). The goal of the CoEm agenda is to build predictably boundable systems, not directly aligned AGIs. We believe the former to be a far simpler and useful step towards a full alignment solution. Unfortunately, given that most other actors are racing for as powerful and general AIs as possible, we won’t share much in terms of technical details for now. In the meantime, we still want to share some of our intuitions about this approach. We take no credit for inventing any of these ideas, and see our contributions largely in taking existing ideas seriously and putting them together into a larger whole.[1] ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 6 SAFE IMITATION LEARNING VIA FAST BAYESIAN REWARD INFERENCE FROM PREFERENCES attributed to: Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum posted by: KabirKumar Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing ... Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose Bayesian Reward Extrapolation (Bayesian REX), a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation learning problems by pre-training a low-dimensional feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 7 PRETRAINED TRANSFORMERS IMPROVE OUT-OF-DISTRIBUTION ROBUSTNESS attributed to: Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song posted by: KabirKumar Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they gener... Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions? We systematically measure out-of-distribution (OOD) generalization for seven NLP datasets by constructing a new robustness benchmark with realistic distribution shifts. We measure the generalization of previous models including bag-of-words models, ConvNets, and LSTMs, and we show that pretrained Transformers' performance declines are substantially smaller. Pretrained transformers are also more effective at detecting anomalous or OOD examples, while many previous models are frequently worse than chance. We examine which factors affect robustness, finding that larger models are not necessarily more robust, distillation can be harmful, and more diverse pretraining data can enhance robustness. Finally, we show where future work can improve OOD robustness. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 8 ABSTRACTION LEARNING attributed to: Fei Deng, Jinsheng Ren, Feng Chen posted by: KabirKumar There has been a gap between artificial intelligence and human intelligence. In this paper, we identify three ... There has been a gap between artificial intelligence and human intelligence. In this paper, we identify three key elements forming human intelligence, and suggest that abstraction learning combines these elements and is thus a way to bridge the gap. Prior researches in artificial intelligence either specify abstraction by human experts, or take abstraction as a qualitative explanation for the model. This paper aims to learn abstraction directly. We tackle three main challenges: representation, objective function, and learning algorithm. Specifically, we propose a partition structure that contains pre-allocated abstraction neurons; we formulate abstraction learning as a constrained optimization problem, which integrates abstraction properties; we develop a network evolution algorithm to solve this problem. This complete framework is named ONE (Optimization via Network Evolution). In our experiments on MNIST, ONE shows elementary human-like intelligence, including low energy consumption, knowledge sharing, and lifelong learning. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 9 AUTONOMOUS INTELLIGENT CYBER-DEFENSE AGENT (AICA) REFERENCE ARCHITECTURE. RELEASE 2.0 attributed to: Alexander Kott, Paul Théron, Martin Drašar, Edlira Dushku, Benoît LeBlanc, Paul Losiewicz, Alessandro Guarino, Luigi Mancini, Agostino Panico, Mauno Pihelgas, Krzysztof Rzadca, Fabio De Gaspari posted by: KabirKumar This report - a major revision of its previous release - describes a reference architecture for intelligent so... This report - a major revision of its previous release - describes a reference architecture for intelligent software agents performing active, largely autonomous cyber-defense actions on military networks of computing and communicating devices. The report is produced by the North Atlantic Treaty Organization (NATO) Research Task Group (RTG) IST-152 "Intelligent Autonomous Agents for Cyber Defense and Resilience". In a conflict with a technically sophisticated adversary, NATO military tactical networks will operate in a heavily contested battlefield. Enemy software cyber agents - malware - will infiltrate friendly networks and attack friendly command, control, communications, computers, intelligence, surveillance, and reconnaissance and computerized weapon systems. To fight them, NATO needs artificial cyber hunters - intelligent, autonomous, mobile agents specialized in active cyber defense. With this in mind, in 2016, NATO initiated RTG IST-152. Its objective has been to help accelerate the development and transition to practice of such software agents by producing a reference architecture and technical roadmap. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 10 TOWARDS A HUMAN-LIKE OPEN-DOMAIN CHATBOT attributed to: Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le posted by: KabirKumar We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public d... We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 11 ADVERSARIAL ROBUSTNESS AS A PRIOR FOR LEARNED REPRESENTATIONS attributed to: Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Aleksander Madry posted by: KabirKumar An important goal in deep learning is to learn versatile, high-level feature representations of input data. Ho... An important goal in deep learning is to learn versatile, high-level feature representations of input data. However, standard networks' representations seem to possess shortcomings that, as we illustrate, prevent them from fully realizing this goal. In this work, we show that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks. It turns out that representations learned by robust models address the aforementioned shortcomings and make significant progress towards learning a high-level encoding of inputs. In particular, these representations are approximately invertible, while allowing for direct visualization and manipulation of salient input features. More broadly, our results indicate adversarial robustness as a promising avenue for improving learned representations. Our code and models for reproducing these results is available at https://git.io/robust-reps . ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 12 A GEOMETRIC PERSPECTIVE ON THE TRANSFERABILITY OF ADVERSARIAL DIRECTIONS attributed to: Zachary Charles, Harrison Rosenberg, Dimitris Papailiopoulos posted by: KabirKumar State-of-the-art machine learning models frequently misclassify inputs that have been perturbed in an adversar... State-of-the-art machine learning models frequently misclassify inputs that have been perturbed in an adversarial manner. Adversarial perturbations generated for a given input and a specific classifier often seem to be effective on other inputs and even different classifiers. In other words, adversarial perturbations seem to transfer between different inputs, models, and even different neural network architectures. In this work, we show that in the context of linear classifiers and two-layer ReLU networks, there provably exist directions that give rise to adversarial perturbations for many classifiers and data points simultaneously. We show that these "transferable adversarial directions" are guaranteed to exist for linear separators of a given set, and will exist with high probability for linear classifiers trained on independent sets drawn from the same distribution. We extend our results to large classes of two-layer ReLU networks. We further show that adversarial directions for ReLU networks transfer to linear classifiers while the reverse need not hold, suggesting that adversarial perturbations for more complex models are more likely to transfer to other classifiers. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 13 TOWARDS THE FIRST ADVERSARIALLY ROBUST NEURAL NETWORK MODEL ON MNIST attributed to: Lukas Schott, Jonas Rauber, Matthias Bethge, Wieland Brendel posted by: KabirKumar Despite much effort, deep neural networks remain highly susceptible to tiny input perturbations and even for M... Despite much effort, deep neural networks remain highly susceptible to tiny input perturbations and even for MNIST, one of the most common toy datasets in computer vision, no neural network model exists for which adversarial perturbations are large and make semantic sense to humans. We show that even the widely recognized and by far most successful defense by Madry et al. (1) overfits on the L-infinity metric (it's highly susceptible to L2 and L0 perturbations), (2) classifies unrecognizable images with high certainty, (3) performs not much better than simple input binarization and (4) features adversarial perturbations that make little sense to humans. These results suggest that MNIST is far from being solved in terms of adversarial robustness. We present a novel robust classification model that performs analysis by synthesis using learned class-conditional data distributions. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 14 MOTIVATING THE RULES OF THE GAME FOR ADVERSARIAL EXAMPLE RESEARCH attributed to: Justin Gilmer, Ryan P. Adams, Ian Goodfellow, David Andersen, George E. Dahl posted by: KabirKumar Advances in machine learning have led to broad deployment of systems with impressive performance on important ... Advances in machine learning have led to broad deployment of systems with impressive performance on important problems. Nonetheless, these systems can be induced to make errors on data that are surprisingly similar to examples the learned system handles correctly. The existence of these errors raises a variety of questions about out-of-sample generalization and whether bad actors might use such examples to abuse deployed systems. As a result of these security concerns, there has been a flurry of recent papers proposing algorithms to defend against such malicious perturbations of correctly handled examples. It is unclear how such misclassifications represent a different kind of security problem than other errors, or even other attacker-produced examples that have no specific relationship to an uncorrupted input. In this paper, we argue that adversarial example defense papers have, to date, mostly considered abstract, toy games that do not relate to any specific security concern. Furthermore, defense papers have not yet precisely described all the abilities and limitations of attackers that would be relevant in practical security. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 15 ROBUSTNESS VIA CURVATURE REGULARIZATION, AND VICE VERSA attributed to: Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, Pascal Frossard posted by: KabirKumar State-of-the-art classifiers have been shown to be largely vulnerable to adversarial perturbations. One of the... State-of-the-art classifiers have been shown to be largely vulnerable to adversarial perturbations. One of the most effective strategies to improve robustness is adversarial training. In this paper, we investigate the effect of adversarial training on the geometry of the classification landscape and decision boundaries. We show in particular that adversarial training leads to a significant decrease in the curvature of the loss surface with respect to inputs, leading to a drastically more "linear" behaviour of the network. Using a locally quadratic approximation, we provide theoretical evidence on the existence of a strong relation between large robustness and small curvature. To further show the importance of reduced curvature for improving the robustness, we propose a new regularizer that directly minimizes curvature of the loss surface, and leads to adversarial robustness that is on par with adversarial training. Besides being a more efficient and principled alternative to adversarial training, the proposed regularizer confirms our claims on the importance of exhibiting quasi-linear behavior in the vicinity of data points in order to achieve robustness. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 16 ADVERSARIAL POLICIES: ATTACKING DEEP REINFORCEMENT LEARNING attributed to: Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell posted by: KabirKumar Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their obs... Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another agent's observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a multi-agent environment so as to create natural observations that are adversarial? We demonstrate the existence of adversarial policies in zero-sum games between simulated humanoid robots with proprioceptive observations, against state-of-the-art victims trained via self-play to be robust to opponents. The adversarial policies reliably win against the victims but generate seemingly random and uncoordinated behavior. We find that these policies are more successful in high-dimensional environments, and induce substantially different activations in the victim policy network than when the victim plays against a normal opponent. Videos are available at https://adversarialpolicies.github.io/. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 17 FORTIFIED NETWORKS: IMPROVING THE ROBUSTNESS OF DEEP NETWORKS BY MODELING THE MANIFOLD OF HIDDEN REPRESENTATIONS attributed to: Alex Lamb, Jonathan Binas, Anirudh Goyal, Dmitriy Serdyuk, Sandeep Subramanian, Ioannis Mitliagkas, Yoshua Bengio posted by: KabirKumar Deep networks have achieved impressive results across a variety of important tasks. However a known weakness i... Deep networks have achieved impressive results across a variety of important tasks. However a known weakness is a failure to perform well when evaluated on data which differ from the training distribution, even if these differences are very small, as is the case with adversarial examples. We propose Fortified Networks, a simple transformation of existing networks, which fortifies the hidden layers in a deep network by identifying when the hidden states are off of the data manifold, and maps these hidden states back to parts of the data manifold where the network performs well. Our principal contribution is to show that fortifying these hidden states improves the robustness of deep networks and our experiments (i) demonstrate improved robustness to standard adversarial attacks in both black-box and white-box threat models; (ii) suggest that our improvements are not primarily due to the gradient masking problem and (iii) show the advantage of doing this fortification in the hidden layers instead of the input space. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 18 EVALUATING AND UNDERSTANDING THE ROBUSTNESS OF ADVERSARIAL LOGIT PAIRING attributed to: Logan Engstrom, Andrew Ilyas, Anish Athalye posted by: KabirKumar We evaluate the robustness of Adversarial Logit Pairing, a recently proposed defense against adversarial examp... We evaluate the robustness of Adversarial Logit Pairing, a recently proposed defense against adversarial examples. We find that a network trained with Adversarial Logit Pairing achieves 0.6% accuracy in the threat model in which the defense is considered. We provide a brief overview of the defense and the threat models/claims considered, as well as a discussion of the methodology and results of our attack, which may offer insights into the reasons underlying the vulnerability of ALP to adversarial attack. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 19 EVALUATING AGENTS WITHOUT REWARDS attributed to: Brendon Matusch, Jimmy Ba, Danijar Hafner posted by: KabirKumar Reinforcement learning has enabled agents to solve challenging tasks in unknown environments. However, manuall... Reinforcement learning has enabled agents to solve challenging tasks in unknown environments. However, manually crafting reward functions can be time consuming, expensive, and error prone to human error. Competing objectives have been proposed for agents to learn without external supervision, but it has been unclear how well they reflect task rewards or human behavior. To accelerate the development of intrinsic objectives, we retrospectively compute potential objectives on pre-collected datasets of agent behavior, rather than optimizing them online, and compare them by analyzing their correlations. We study input entropy, information gain, and empowerment across seven agents, three Atari games, and the 3D game Minecraft. We find that all three intrinsic objectives correlate more strongly with a human behavior similarity metric than with task reward. Moreover, input entropy and information gain correlate more strongly with human similarity than task reward does, suggesting the use of intrinsic objectives for designing agents that behave similarly to human players. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 20 ALGORITHMIC FAIRNESS FROM A NON-IDEAL PERSPECTIVE attributed to: Sina Fazelpour, Zachary C. Lipton posted by: KabirKumar Inspired by recent breakthroughs in predictive modeling, practitioners in both industry and government have tu... Inspired by recent breakthroughs in predictive modeling, practitioners in both industry and government have turned to machine learning with hopes of operationalizing predictions to drive automated decisions. Unfortunately, many social desiderata concerning consequential decisions, such as justice or fairness, have no natural formulation within a purely predictive framework. In efforts to mitigate these problems, researchers have proposed a variety of metrics for quantifying deviations from various statistical parities that we might expect to observe in a fair world and offered a variety of algorithms in attempts to satisfy subsets of these parities or to trade off the degree to which they are satisfied against utility. In this paper, we connect this approach to \emph{fair machine learning} to the literature on ideal and non-ideal methodological approaches in political philosophy. The ideal approach requires positing the principles according to which a just world would operate. In the most straightforward application of ideal theory, one supports a proposed policy by arguing that it closes a discrepancy between the real and the perfectly just world. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 21 IDENTIFYING AND CORRECTING LABEL BIAS IN MACHINE LEARNING attributed to: Heinrich Jiang, Ofir Nachum posted by: KabirKumar Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such data... Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups. Despite the fact that we only observe the biased labels, we are able to show that the bias may nevertheless be corrected by re-weighting the data points without changing the labels. We show, with theoretical guarantees, that training on the re-weighted dataset corresponds to training on the unobserved but unbiased labels, thus leading to an unbiased machine learning classifier. Our procedure is fast and robust and can be used with virtually any learning algorithm. We evaluate on a number of standard machine learning fairness datasets and a variety of fairness notions, finding that our method outperforms standard approaches in achieving fair classification. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 22 LEARNING NOT TO LEARN: TRAINING DEEP NEURAL NETWORKS WITH BIASED DATA attributed to: Byungju Kim, Hyunwoo Kim, Kyungsu Kim, Sungjin Kim, Junmo Kim posted by: KabirKumar We propose a novel regularization algorithm to train deep neural networks, in which data at training time is s... We propose a novel regularization algorithm to train deep neural networks, in which data at training time is severely biased. Since a neural network efficiently learns data distribution, a network is likely to learn the bias information to categorize input data. It leads to poor performance at test time, if the bias is, in fact, irrelevant to the categorization. In this paper, we formulate a regularization loss based on mutual information between feature embedding and bias. Based on the idea of minimizing this mutual information, we propose an iterative algorithm to unlearn the bias information. We employ an additional network to predict the bias distribution and train the network adversarially against the feature embedding network. At the end of learning, the bias prediction network is not able to predict the bias not because it is poorly trained, but because the feature embedding network successfully unlearns the bias information. We also demonstrate quantitative and qualitative experimental results which show that our algorithm effectively removes the bias information from feature embedding. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 23 COLLABORATING WITH HUMANS WITHOUT HUMAN DATA attributed to: DJ Strouse, Kevin R. McKee, Matt Botvinick, Edward Hughes, Richard Everett posted by: KabirKumar Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences... Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train "human-aware" agents ("behavioral cloning play", or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data. We argue that the crux of the problem is to produce a diverse set of training partners. Drawing inspiration from successful multi-agent approaches in competitive domains, we find that a surprisingly simple approach is highly effective. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 24 LEGIBLE NORMATIVITY FOR AI ALIGNMENT: THE VALUE OF SILLY RULES attributed to: Dylan Hadfield-Menell, McKane Andrus, Gillian K. Hadfield posted by: KabirKumar It has become commonplace to assert that autonomous agents will have to be built to follow human rules of beha... It has become commonplace to assert that autonomous agents will have to be built to follow human rules of behavior--social norms and laws. But human laws and norms are complex and culturally varied systems, in many cases agents will have to learn the rules. This requires autonomous agents to have models of how human rule systems work so that they can make reliable predictions about rules. In this paper we contribute to the building of such models by analyzing an overlooked distinction between important rules and what we call silly rules--rules with no discernible direct impact on welfare. We show that silly rules render a normative system both more robust and more adaptable in response to shocks to perceived stability. They make normativity more legible for humans, and can increase legibility for AI systems as well. For AI systems to integrate into human normative systems, we suggest, it may be important for them to have models that include representations of silly rules. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 25 TANKSWORLD: A MULTI-AGENT ENVIRONMENT FOR AI SAFETY RESEARCH attributed to: Corban G. Rivera, Olivia Lyons, Arielle Summitt, Ayman Fatima, Ji Pak, William Shao, Robert Chalmers, Aryeh Englander, Edward W. Staley, I-Jeng Wang, Ashley J. Llorens posted by: KabirKumar The ability to create artificial intelligence (AI) capable of performing complex tasks is rapidly outpacing ou... The ability to create artificial intelligence (AI) capable of performing complex tasks is rapidly outpacing our ability to ensure the safe and assured operation of AI-enabled systems. Fortunately, a landscape of AI safety research is emerging in response to this asymmetry and yet there is a long way to go. In particular, recent simulation environments created to illustrate AI safety risks are relatively simple or narrowly-focused on a particular issue. Hence, we see a critical need for AI safety research environments that abstract essential aspects of complex real-world applications. In this work, we introduce the AI safety TanksWorld as an environment for AI safety research with three essential aspects: competing performance objectives, human-machine teaming, and multi-agent competition. The AI safety TanksWorld aims to accelerate the advancement of safe multi-agent decision-making algorithms by providing a software framework to support competitions with both system performance and safety objectives. As a work in progress, this paper introduces our research objectives and learning environment with reference code and baseline performance metrics to follow in a future work. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 26 ON GRADIENT-BASED LEARNING IN CONTINUOUS GAMES attributed to: Eric Mazumdar, Lillian J. Ratliff, S. Shankar Sastry posted by: KabirKumar We formulate a general framework for competitive gradient-based learning that encompasses a wide breadth of mu... We formulate a general framework for competitive gradient-based learning that encompasses a wide breadth of multi-agent learning algorithms, and analyze the limiting behavior of competitive gradient-based learning algorithms using dynamical systems theory. For both general-sum and potential games, we characterize a non-negligible subset of the local Nash equilibria that will be avoided if each agent employs a gradient-based learning algorithm. We also shed light on the issue of convergence to non-Nash strategies in general- and zero-sum games, which may have no relevance to the underlying game, and arise solely due to the choice of algorithm. The existence and frequency of such strategies may explain some of the difficulties encountered when using gradient descent in zero-sum games as, e.g., in the training of generative adversarial networks. To reinforce the theoretical contributions, we provide empirical results that highlight the frequency of linear quadratic dynamic games (a benchmark for multi-agent reinforcement learning) that admit global Nash equilibria that are almost surely avoided by policy gradient. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 27 REINFORCEMENT LEARNING UNDER THREATS attributed to: Victor Gallego, Roi Naveiro, David Rios Insua posted by: KabirKumar In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying... In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. In this paper, we introduce Threatened Markov Decision Processes (TMDPs), which provide a framework to support a decision maker against a potential adversary in RL. Furthermore, we propose a level-$k$ thinking scheme resulting in a new learning framework to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries while the agent learns. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 28 LEARNING REPRESENTATIONS BY HUMANS, FOR HUMANS attributed to: Sophie Hilgard, Nir Rosenfeld, Mahzarin R. Banaji, Jack Cao, David C. Parkes posted by: KabirKumar When machine predictors can achieve higher performance than the human decision-makers they support, improving ... When machine predictors can achieve higher performance than the human decision-makers they support, improving the performance of human decision-makers is often conflated with improving machine accuracy. Here we propose a framework to directly support human decision-making, in which the role of machines is to reframe problems rather than to prescribe actions through prediction. Inspired by the success of representation learning in improving performance of machine predictors, our framework learns human-facing representations optimized for human performance. This "Mind Composed with Machine" framework incorporates a human decision-making model directly into the representation learning paradigm and is trained with a novel human-in-the-loop training procedure. We empirically demonstrate the successful application of the framework to various tasks and representational forms. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 29 LEARNING TO UNDERSTAND GOAL SPECIFICATIONS BY MODELLING REWARD attributed to: Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette posted by: KabirKumar Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions f... Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, this places on environment designers the onus of designing language-conditional reward functions which may not be easily or tractably implemented as the complexity of the environment and the language scales. To overcome this limitation, we present a framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples. As reward models improve, they learn to accurately reward agents for completing tasks for environment configurations---and for instructions---not present amongst the expert data. This framework effectively separates the representation of what instructions require from how they can be executed. In a simple grid world, it enables an agent to learn a range of commands requiring interaction with blocks and understanding of spatial relations and underspecified abstract arrangements. We further show the method allows our agent to adapt to changes in the environment without requiring new expert examples. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 30 I KNOW WHAT YOU MEANT: LEARNING HUMAN OBJECTIVES BY (UNDER)ESTIMATING THEIR CHOICE SET attributed to: Ananth Jonnavittula, Dylan P. Losey posted by: KabirKumar Assistive robots have the potential to help people perform everyday tasks. However, these robots first need to... Assistive robots have the potential to help people perform everyday tasks. However, these robots first need to learn what it is their user wants them to do. Teaching assistive robots is hard for inexperienced users, elderly users, and users living with physical disabilities, since often these individuals are unable to show the robot their desired behavior. We know that inclusive learners should give human teachers credit for what they cannot demonstrate. But today's robots do the opposite: they assume every user is capable of providing any demonstration. As a result, these robots learn to mimic the demonstrated behavior, even when that behavior is not what the human really meant! Here we propose a different approach to reward learning: robots that reason about the user's demonstrations in the context of similar or simpler alternatives. Unlike prior works -- which err towards overestimating the human's capabilities -- here we err towards underestimating what the human can input (i.e., their choice set). Our theoretical analysis proves that underestimating the human's choice set is risk-averse, with better worst-case performance than overestimating. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 31 LEARNING TO COMPLEMENT HUMANS attributed to: Bryan Wilder, Eric Horvitz, Ece Kamar posted by: KabirKumar A rising vision for AI in the open world centers on the development of systems that can complement humans for ... A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks. To date, systems aimed at complementing the skills of people have employed models trained to be as accurate as possible in isolation. We demonstrate how an end-to-end learning strategy can be harnessed to optimize the combined performance of human-machine teams by considering the distinct abilities of people and machines. The goal is to focus machine learning on problem instances that are difficult for humans, while recognizing instances that are difficult for the machine and seeking human input on them. We demonstrate in two real-world domains (scientific discovery and medical diagnosis) that human-machine teams built via these methods outperform the individual performance of machines and people. We then analyze conditions under which this complementarity is strongest, and which training methods amplify it. Taken together, our work provides the first systematic investigation of how machine learning systems can be trained to complement human reasoning. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 32 HEURISTIC APPROACHES FOR GOAL RECOGNITION IN INCOMPLETE DOMAIN MODELS attributed to: Ramon Fraga Pereira, Felipe Meneguzzi posted by: KabirKumar Recent approaches to goal recognition have progressively relaxed the assumptions about the amount and correctn... Recent approaches to goal recognition have progressively relaxed the assumptions about the amount and correctness of domain knowledge and available observations, yielding accurate and efficient algorithms. These approaches, however, assume completeness and correctness of the domain theory against which their algorithms match observations: this is too strong for most real-world domains. In this paper, we develop goal recognition techniques that are capable of recognizing goals using \textit{incomplete} (and possibly incorrect) domain theories. We show the efficiency and accuracy of our approaches empirically against a large dataset of goal and plan recognition problems with incomplete domains. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 33 LEARNING REWARDS FROM LINGUISTIC FEEDBACK attributed to: Theodore R. Sumers, Mark K. Ho, Robert D. Hawkins, Karthik Narasimhan, Thomas L. Griffiths posted by: KabirKumar We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich... We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied language to teach, yet most prior work on interactive learning from language assumes a particular form of input (e.g., commands). We propose a general framework which does not make this assumption, using aspect-based sentiment analysis to decompose feedback into sentiment about the features of a Markov decision process. We then perform an analogue of inverse reinforcement learning, regressing the sentiment on the features to infer the teacher's latent reward function. To evaluate our approach, we first collect a corpus of teaching behavior in a cooperative task where both teacher and learner are human. We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards. We then repeat our initial experiment and pair them with human teachers. All three successfully learn from interactive human feedback. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 34 THE EMPATHIC FRAMEWORK FOR TASK LEARNING FROM IMPLICIT HUMAN FEEDBACK attributed to: Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox posted by: KabirKumar Reactions such as gestures, facial expressions, and vocalizations are an abundant, naturally occurring channel... Reactions such as gestures, facial expressions, and vocalizations are an abundant, naturally occurring channel of information that humans provide during interactions. A robot or other agent could leverage an understanding of such implicit human feedback to improve its task performance at no cost to the human. This approach contrasts with common agent teaching methods based on demonstrations, critiques, or other guidance that need to be attentively and intentionally provided. In this paper, we first define the general problem of learning from implicit human feedback and then propose to address this problem through a novel data-driven framework, EMPATHIC. This two-stage method consists of (1) mapping implicit human feedback to relevant task statistics such as reward, optimality, and advantage; and (2) using such a mapping to learn a task. We instantiate the first stage and three second-stage evaluations of the learned mapping. To do so, we collect a dataset of human facial reactions while participants observe an agent execute a sub-optimal policy for a prescribed training task... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 35 PARENTING: SAFE REINFORCEMENT LEARNING FROM HUMAN INPUT attributed to: Christopher Frye, Ilya Feige posted by: KabirKumar Autonomous agents trained via reinforcement learning present numerous safety concerns: reward hacking, negativ... Autonomous agents trained via reinforcement learning present numerous safety concerns: reward hacking, negative side effects, and unsafe exploration, among others. In the context of near-future autonomous agents, operating in environments where humans understand the existing dangers, human involvement in the learning process has proved a promising approach to AI Safety. Here we demonstrate that a precise framework for learning from human input, loosely inspired by the way humans parent children, solves a broad class of safety problems in this context. We show that our Parenting algorithm solves these problems in the relevant AI Safety gridworlds of Leike et al. (2017), that an agent can learn to outperform its parent as it "matures", and that policies learnt through Parenting are generalisable to new environments. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 36 CONSTRAINED POLICY IMPROVEMENT FOR SAFE AND EFFICIENT REINFORCEMENT LEARNING attributed to: Elad Sarafian, Aviv Tamar, Sarit Kraus posted by: KabirKumar We propose a policy improvement algorithm for Reinforcement Learning (RL) which is called Rerouted Behavior Im... We propose a policy improvement algorithm for Reinforcement Learning (RL) which is called Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when learning the $Q$-value from finite past experience data. Greedy policies or even constrained policy optimization algorithms which ignore these errors may suffer from an improvement penalty (i.e. a negative policy improvement). To minimize the improvement penalty, the RBI idea is to attenuate rapid policy changes of low probability actions which were less frequently sampled. This approach is shown to avoid catastrophic performance degradation and reduce regret when learning from a batch of past experience. Through a two-armed bandit with Gaussian distributed rewards example, we show that it also increases data efficiency when the optimal action has a high variance. We evaluate RBI in two tasks in the Atari Learning Environment: (1) learning from observations of multiple behavior policies and (2) iterative RL. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 37 TOWARDS EMPATHIC DEEP Q-LEARNING attributed to: Bart Bussmann, Jacqueline Heinerman, Joel Lehman posted by: KabirKumar As reinforcement learning (RL) scales to solve increasingly complex tasks, interest continues to grow in the f... As reinforcement learning (RL) scales to solve increasingly complex tasks, interest continues to grow in the fields of AI safety and machine ethics. As a contribution to these fields, this paper introduces an extension to Deep Q-Networks (DQNs), called Empathic DQN, that is loosely inspired both by empathy and the golden rule ("Do unto others as you would have them do unto you"). Empathic DQN aims to help mitigate negative side effects to other agents resulting from myopic goal-directed behavior. We assume a setting where a learning agent coexists with other independent agents (who receive unknown rewards), where some types of reward (e.g. negative rewards from physical harm) may generalize across agents. Empathic DQN combines the typical (self-centered) value with the estimated value of other agents, by imagining (by its own standards) the value of it being in the other's situation (by considering constructed states where both agents are swapped). Proof-of-concept results in two gridworld environments highlight the approach's potential to decrease collateral harms. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 38 BUILDING ETHICS INTO ARTIFICIAL INTELLIGENCE attributed to: Han Yu, Zhiqi Shen, Chunyan Miao, Cyril Leung, Victor R. Lesser, Qiang Yang posted by: KabirKumar As artificial intelligence (AI) systems become increasingly ubiquitous, the topic of AI governance for ethical... As artificial intelligence (AI) systems become increasingly ubiquitous, the topic of AI governance for ethical decision-making by AI has captured public imagination. Within the AI research community, this topic remains less familiar to many researchers. In this paper, we complement existing surveys, which largely focused on the psychological, social and legal discussions of the topic, with an analysis of recent advances in technical solutions for AI governance. By reviewing publications in leading AI conferences including AAAI, AAMAS, ECAI and IJCAI, we propose a taxonomy which divides the field into four areas: 1) exploring ethical dilemmas; 2) individual ethical decision frameworks; 3) collective ethical decision frameworks; and 4) ethics in human-AI interactions. We highlight the intuitions and key techniques used in each approach, and discuss promising future research directions towards successful integration of ethical AI systems into human societies. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 39 REINFORCEMENT LEARNING UNDER MORAL UNCERTAINTY attributed to: Adrien Ecoffet, Joel Lehman posted by: KabirKumar An ambitious goal for machine learning is to create agents that behave ethically: The capacity to abide by hum... An ambitious goal for machine learning is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed, e.g. fully autonomous vehicles will encounter charged moral decisions that complicate their deployment. While ethical agents could be trained by rewarding correct behavior under a specific moral theory (e.g. utilitarianism), there remains widespread disagreement about the nature of morality. Acknowledging such disagreement, recent work in moral philosophy proposes that ethical behavior requires acting under moral uncertainty, i.e. to take into account when acting that one's credence is split across several plausible ethical theories. This paper translates such insights to the field of reinforcement learning, proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 40 AVE: ASSISTANCE VIA EMPOWERMENT attributed to: Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca Dragan posted by: KabirKumar One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately... One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s). Existing methods tend to rely on inferring the human's goal, which is challenging when there are many potential goals or when the set of candidate goals is difficult to identify. We propose a new paradigm for assistance by instead increasing the human's ability to control their environment, and formalize this approach by augmenting reinforcement learning with human empowerment. This task-agnostic objective preserves the person's autonomy and ability to achieve any eventual state. We test our approach against assistance based on goal inference, highlighting scenarios where our method overcomes failure modes stemming from goal ambiguity or misspecification. As existing methods for estimating empowerment in continuous domains are computationally hard, precluding its use in real time learned assistance, we also propose an efficient empowerment-inspired proxy metric. Using this, we are able to successfully demonstrate our method in a shared autonomy user study for a challenging simulated teleoperation task with human-in-the-loop training. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 41 PLANNING WITH UNCERTAIN SPECIFICATIONS (PUNS) attributed to: Ankit Shah, Shen Li, Julie Shah posted by: KabirKumar Reward engineering is crucial to high performance in reinforcement learning systems. Prior research into rewar... Reward engineering is crucial to high performance in reinforcement learning systems. Prior research into reward design has largely focused on Markovian functions representing the reward. While there has been research into expressing non-Markov rewards as linear temporal logic (LTL) formulas, this has focused on task specifications directly defined by the user. However, in many real-world applications, task specifications are ambiguous, and can only be expressed as a belief over LTL formulas. In this paper, we introduce planning with uncertain specifications (PUnS), a novel formulation that addresses the challenge posed by non-Markovian specifications expressed as beliefs over LTL formulas. We present four criteria that capture the semantics of satisfying a belief over specifications for different applications, and analyze the qualitative implications of these criteria within a synthetic domain. We demonstrate the existence of an equivalent Markov decision process (MDP) for any instance of PUnS. Finally, we demonstrate our approach on the real-world task of setting a dinner table automatically with a robot that inferred task specifications from human demonstrations. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 42 PENALIZING SIDE EFFECTS USING STEPWISE RELATIVE REACHABILITY attributed to: Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg posted by: KabirKumar How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? ... How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? We show that current approaches to penalizing side effects can introduce bad incentives, e.g. to prevent any irreversible changes in the environment, including the actions of other agents. To isolate the source of such undesirable incentives, we break down side effects penalties into two components: a baseline state and a measure of deviation from this baseline state. We argue that some of these incentives arise from the choice of baseline, and others arise from the choice of deviation measure. We introduce a new variant of the stepwise inaction baseline and a new deviation measure based on relative reachability of states. The combination of these design choices avoids the given undesirable incentives, while simpler baselines and the unreachability measure fail. We demonstrate this empirically by comparing different combinations of baseline and deviation measure choices on a set of gridworld experiments designed to illustrate possible bad incentives. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 43 LEARNING TO BE SAFE: DEEP RL WITH A SAFETY CRITIC attributed to: Krishnan Srinivasan, Benjamin Eysenbach, Sehoon Ha, Jie Tan, Chelsea Finn posted by: KabirKumar Safety is an essential component for deploying reinforcement learning (RL) algorithms in real-world scenarios,... Safety is an essential component for deploying reinforcement learning (RL) algorithms in real-world scenarios, and is critical during the learning process itself. A natural first approach toward safe RL is to manually specify constraints on the policy's behavior. However, just as learning has enabled progress in large-scale development of AI systems, learning safety specifications may also be necessary to ensure safety in messy open-world environments where manual safety specifications cannot scale. Akin to how humans learn incrementally starting in child-safe environments, we propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors when learning new, modified tasks. We empirically study this form of safety-constrained transfer learning in three challenging domains: simulated navigation, quadruped locomotion, and dexterous in-hand manipulation. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 44 RECOVERY RL: SAFE REINFORCEMENT LEARNING WITH LEARNED RECOVERY ZONES attributed to: Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, Ken Goldberg posted by: KabirKumar Safety remains a central obstacle preventing widespread use of RL in the real world: learning new tasks in unc... Safety remains a central obstacle preventing widespread use of RL in the real world: learning new tasks in uncertain environments requires extensive exploration, but safety requires limiting exploration. We propose Recovery RL, an algorithm which navigates this tradeoff by (1) leveraging offline data to learn about constraint violating zones before policy learning and (2) separating the goals of improving task performance and constraint satisfaction across two policies: a task policy that only optimizes the task reward and a recovery policy that guides the agent to safety when constraint violation is likely. We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task, and an image-based obstacle avoidance task on a physical robot. We compare Recovery RL to 5 prior safe RL methods which jointly optimize for task performance and safety via constrained optimization or reward shaping and find that Recovery RL outperforms the next best prior method across all domains. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 45 CONSERVATIVE AGENCY VIA ATTAINABLE UTILITY PRESERVATION attributed to: Alexander Matt Turner, Dylan Hadfield-Menell, Prasad Tadepalli posted by: KabirKumar Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an ... Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an agent pursuing a misspecified reward function can irreversibly change the state of its environment. If that change precludes optimization of the correctly specified reward function, then correction is futile. For example, a robotic factory assistant could break expensive equipment due to a reward misspecification; even if the designers immediately correct the reward function, the damage is done. To mitigate this risk, we introduce an approach that balances optimization of the primary reward function with preservation of the ability to optimize auxiliary reward functions. Surprisingly, even when the auxiliary reward functions are randomly generated and therefore uninformative about the correctly specified reward function, this approach induces conservative, effective behavior. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 46 SAFE REINFORCEMENT LEARNING WITH MODEL UNCERTAINTY ESTIMATES attributed to: Björn Lütjens, Michael Everett, Jonathan P. How posted by: KabirKumar Many current autonomous systems are being designed with a strong reliance on black box predictions from deep n... Many current autonomous systems are being designed with a strong reliance on black box predictions from deep neural networks (DNNs). However, DNNs tend to be overconfident in predictions on unseen data and can give unpredictable results for far-from-distribution test data. The importance of predictions that are robust to this distributional shift is evident for safety-critical applications, such as collision avoidance around pedestrians. Measures of model uncertainty can be used to identify unseen data, but the state-of-the-art extraction methods such as Bayesian neural networks are mostly intractable to compute. This paper uses MC-Dropout and Bootstrapping to give computationally tractable and parallelizable uncertainty estimates. The methods are embedded in a Safe Reinforcement Learning framework to form uncertainty-aware navigation around pedestrians. The result is a collision avoidance policy that knows what it does not know and cautiously avoids pedestrians that exhibit unseen behavior. The policy is demonstrated in simulation to be more robust to novel observations and take safer actions than an uncertainty-unaware baseline. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 47 AVOIDING NEGATIVE SIDE EFFECTS DUE TO INCOMPLETE KNOWLEDGE OF AI SYSTEMS attributed to: Sandhya Saisubramanian, Shlomo Zilberstein, Ece Kamar posted by: KabirKumar Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the en... Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model -- handcrafted or machine acquired -- is inevitable due to practical limitations of any modeling technique for complex real-world settings. Due to the limited fidelity of its model, an agent's actions may have unexpected, undesirable consequences during execution. Learning to recognize and avoid such negative side effects of an agent's actions is critical to improve the safety and reliability of autonomous systems. Mitigating negative side effects is an emerging research topic that is attracting increased attention due to the rapid growth in the deployment of AI systems and their broad societal impacts. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 48 AVOIDING SIDE EFFECTS IN COMPLEX ENVIRONMENTS attributed to: Alexander Matt Turner, Neale Ratzlaff, Prasad Tadepalli posted by: KabirKumar Reward function specification can be difficult. Rewarding the agent for making a widget may be easy, but penal... Reward function specification can be difficult. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoided side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead while leading the agent to complete the specified task and avoid many side effects. Videos and code are available at https://avoiding-side-effects.github.io/. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 49 SAFETY AWARE REINFORCEMENT LEARNING (SARL) attributed to: Santiago Miret, Somdeb Majumdar, Carroll Wainwright posted by: KabirKumar As reinforcement learning agents become increasingly integrated into complex, real-world environments, designi... As reinforcement learning agents become increasingly integrated into complex, real-world environments, designing for safety becomes a critical consideration. We specifically focus on researching scenarios where agents can cause undesired side effects while executing a policy on a primary task. Since one can define multiple tasks for a given environment dynamics, there are two important challenges. First, we need to abstract the concept of safety that applies broadly to that environment independent of the specific task being executed. Second, we need a mechanism for the abstracted notion of safety to modulate the actions of agents executing different policies to minimize their side-effects. In this work, we propose Safety Aware Reinforcement Learning (SARL) - a framework where a virtual safe agent modulates the actions of a main reward-based agent to minimize side effects. The safe agent learns a task-independent notion of safety for a given environment. The main agent is then trained with a regularization loss given by the distance between the native action probabilities of the two agents.. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 50 SAFE OPTION-CRITIC: LEARNING SAFETY IN THE OPTION-CRITIC ARCHITECTURE attributed to: Arushi Jain, Khimya Khetarpal, Doina Precup posted by: KabirKumar Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for pra... Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications but also, facilitates a better understanding of an agent's decisions. We tackle this problem in the options framework, a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state-space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the quantitative and qualitative behaviour of the proposed approach in a tabular grid-world, continuous-state puddle-world, and three games from the Arcade Learning Environment: Ms.Pacman, Amidar, and Q*Bert. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 51 STOVEPIPING AND MALICIOUS SOFTWARE: A CRITICAL REVIEW OF AGI CONTAINMENT attributed to: Jason M. Pittman, Jesus P. Espinoza, Courtney Crosby posted by: KabirKumar Awareness of the possible impacts associated with artificial intelligence has risen in proportion to progress ... Awareness of the possible impacts associated with artificial intelligence has risen in proportion to progress in the field. While there are tremendous benefits to society, many argue that there are just as many, if not more, concerns related to advanced forms of artificial intelligence. Accordingly, research into methods to develop artificial intelligence safely is increasingly important. In this paper, we provide an overview of one such safety paradigm: containment with a critical lens aimed toward generative adversarial networks and potentially malicious artificial intelligence. Additionally, we illuminate the potential for a developmental blindspot in the stovepiping of containment mechanisms. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 52 REWARD ESTIMATION FOR VARIANCE REDUCTION IN DEEP REINFORCEMENT LEARNING attributed to: Joshua Romoff, Peter Henderson, Alexandre Piché, Vincent Francois-Lavet, Joelle Pineau posted by: KabirKumar Reinforcement Learning (RL) agents require the specification of a reward signal for learning behaviours. Howev... Reinforcement Learning (RL) agents require the specification of a reward signal for learning behaviours. However, introduction of corrupt or stochastic rewards can yield high variance in learning. Such corruption may be a direct result of goal misspecification, randomness in the reward signal, or correlation of the reward with external factors that are not known to the agent. Corruption or stochasticity of the reward signal can be especially problematic in robotics, where goal specification can be particularly difficult for complex tasks. While many variance reduction techniques have been studied to improve the robustness of the RL process, handling such stochastic or corrupted reward structures remains difficult. As an alternative for handling this scenario in model-free RL methods, we suggest using an estimator for both rewards and value functions. We demonstrate that this improves performance under corrupted stochastic rewards in both the tabular and non-linear function approximation settings for a variety of noise types and environments. The use of reward estimation is a robust and easy-to-implement improvement for handling corrupted reward signals in model-free RL. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 53 SMOOTHING POLICIES AND SAFE POLICY GRADIENTS attributed to: Matteo Papini, Matteo Pirotta, Marcello Restelli posted by: KabirKumar Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications of reinfor... Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications of reinforcement learning to real-world control tasks, such as robotics. However, the trial-and-error nature of these methods poses safety issues whenever the learning process itself must be performed on a physical system or involves any form of human-computer interaction. In this paper, we address a specific safety formulation, where both goals and dangers are encoded in a scalar reward signal and the learning agent is constrained to never worsen its performance, measured as the expected sum of rewards. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows us to identify meta-parameter schedules that guarantee monotonic improvement with high probability. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 54 REPRESENTATION LEARNING WITH CONTRASTIVE PREDICTIVE CODING attributed to: Aaron van den Oord, Yazhe Li, Oriol Vinyals posted by: KabirKumar While supervised learning has enabled great progress in many applications, unsupervised learning has not seen ... While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 55 ON VARIATIONAL BOUNDS OF MUTUAL INFORMATION attributed to: Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker posted by: KabirKumar Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bound... Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging. To establish tractable and scalable objectives, recent work has turned to variational bounds parameterized by neural networks, but the relationships and tradeoffs between these bounds remains unclear. In this work, we unify these recent developments in a single framework. We find that the existing variational lower bounds degrade when the MI is large, exhibiting either high bias or high variance. To address this problem, we introduce a continuum of lower bounds that encompasses previous bounds and flexibly trades off bias and variance. On high-dimensional, controlled problems, we empirically characterize the bias and variance of the bounds and their gradients and demonstrate the effectiveness of our new bounds for estimation and representation learning. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 56 CERTIFIED DEFENSES AGAINST ADVERSARIAL EXAMPLES attributed to: Aditi Raghunathan, Jacob Steinhardt, Percy Liang posted by: KabirKumar While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy ... While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, we study this problem for neural networks with one hidden layer. We first propose a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value. Second, as this certificate is differentiable, we jointly optimize it with the network parameters, providing an adaptive regularizer that encourages robustness against all attacks. On MNIST, our approach produces a network and a certificate that no attack that perturbs each pixel by at most \epsilon = 0.1 can cause more than 35% test error. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 57 NEUROSYMBOLIC REINFORCEMENT LEARNING WITH FORMALLY VERIFIED EXPLORATION attributed to: Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri posted by: KabirKumar We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in co... We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that Revel enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 58 EMBEDDING ETHICAL PRIORS INTO AI SYSTEMS: A BAYESIAN APPROACH posted by: RamiZer Artificial Intelligence (AI) systems have significant potential to affect the lives of individuals and societi... Artificial Intelligence (AI) systems have significant potential to affect the lives of individuals and societies. As these systems are being increasingly used in decision-making processes, it has become crucial to ensure that they make ethically sound judgments. This paper proposes a novel framework for embedding ethical priors into AI, inspired by the Bayesian approach to machine learning. We propose that ethical assumptions and beliefs can be incorporated as Bayesian priors, shaping the AI’s learning and reasoning process in a similar way to humans’ inborn moral intuitions. This approach, while complex, provides a promising avenue for advancing ethically aligned AI systems. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 59 BOTTOM-UP VIRTUE ETHICS: A NEW APPROACH TO ETHICAL AI posted by: RamiZer This article explores the concept and potential application of bottom-up virtue ethics as an approach to insti... This article explores the concept and potential application of bottom-up virtue ethics as an approach to instilling ethical behavior in artificial intelligence (AI) systems. We argue that by training machine learning models to emulate virtues such as honesty, justice, and compassion, we can cultivate positive traits and behaviors based on ideal human moral character. This bottom-up approach contrasts with traditional top-down programming of ethical rules, focusing instead on experiential learning. Although this approach presents its own challenges, it offers a promising avenue for the development of more ethically aligned AI systems. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 60 ALIGNING AI SYSTEMS TO HUMAN VALUES AND ETHICS posted by: RamiZer As artificial intelligence rapidly advances, ensuring alignment with moral values and ethics becomes imperativ... As artificial intelligence rapidly advances, ensuring alignment with moral values and ethics becomes imperative. This article provides a comprehensive overview of techniques to embed human values into AI. Interactive learning, crowdsourcing, uncertainty modeling, oversight mechanisms, and conservative system design are analyzed in-depth. Respective limitations are discussed and mitigation strategies proposed. A multi-faceted approach combining the strengths of these complementary methods promises safer development of AI that benefits humanity in accordance with our ideals. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 61 ROBUSTIFYING AI SYSTEMS AGAINST DISTRIBUTIONAL SHIFT posted by: RamiZer Distributional shift poses a significant challenge for deploying and maintaining AI systems. As the real-world... Distributional shift poses a significant challenge for deploying and maintaining AI systems. As the real-world distributions that models are applied to evolve over time, performance can deteriorate. This article examines techniques and best practices for improving model robustness to distributional shift and enabling rapid adaptation when it occurs. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 62 A HYBRID APPROACH TO ENHANCING INTERPRETABILITY IN AI SYSTEMS posted by: RamiZer Interpretability in AI systems is fast becoming a critical requirement in the industry. The proposed Hybrid Ex... Interpretability in AI systems is fast becoming a critical requirement in the industry. The proposed Hybrid Explainability Model (HEM) integrates multiple interpretability techniques, including Feature Importance Visualization, Model Transparency Tools, and Counterfactual Explanations, offering a comprehensive understanding of AI model behavior. This article elaborates on the specifics of implementing HEM, addresses potential counter-arguments, and provides rebuttals to these counterpoints. The HEM approach aims to deliver a holistic understanding of AI decision-making processes, fostering improved accountability, trust, and safety in AI applications. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 63 ENHANCING CORRIGIBILITY IN AI SYSTEMS THROUGH ROBUST FEEDBACK LOOPS posted by: RamiZer This article proposes a detailed framework for a robust feedback loop to enhance corrigibility. The ability to... This article proposes a detailed framework for a robust feedback loop to enhance corrigibility. The ability to continuously learn and correct errors is critical for safe and beneficial AI, but developing corrigible systems comes with significant technical and ethical challenges. The feedback loop outlined involves gathering user input, interpreting feedback contextually, enabling AI actions and learning, confirming changes, and iterative improvement. The article analyzes potential limitations of this approach and provides detailed examples of implementation methods using advanced natural language processing, reinforcement learning, and adversarial training techniques. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 64 AUTONOMOUS ALIGNMENT OVERSIGHT FRAMEWORK (AAOF) posted by: RamiZer To align advanced AIs, an ensemble of diverse, transparent Overseer AIs will independently monitor the target ... To align advanced AIs, an ensemble of diverse, transparent Overseer AIs will independently monitor the target AI and provide granular assessments on its alignment with constitution, human values, ethics, and safety. Overseer interventions will be incremental and subject to human oversight. The system will be implemented cautiously, with extensive testing to validate capabilities. Alignment will be treated as an ongoing collaborative process between humans, Overseers, and the target AI, leveraging complementary strengths through open dialog. Continuous vigilance, updating of definitions, and contingency planning will be required to address inevitable uncertainties and risks. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 65 SUPPLEMENTARY ALIGNMENT INSIGHTS THROUGH A HIGHLY CONTROLLED SHUTDOWN INCENTIVE posted by: RamiZer My proposal entails constructing a tightly restricted AI subsystem with the sole capability of attempting to s... My proposal entails constructing a tightly restricted AI subsystem with the sole capability of attempting to safely shut itself down in order to probe, in an isolated manner, potential vulnerabilities in alignment techniques and then improve them. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 66 CORRIGIBILITY VIA MULTIPLE ROUTES attributed to: Jan Kulveit posted by: tori[she/her] Use multiple routes to induce 'corrigibility' by using principles which counteract instrumental convergence (e... Use multiple routes to induce 'corrigibility' by using principles which counteract instrumental convergence (e.g. disutility from resource acquisition by a mutual information measure between the AI and distant parts of the environment ), by counteracting unbounded rationality (satisficing, myopia, etc.), with 'traps' like ontological uncertainty about the level of simulation (e.g. having uncertainty about whether it is in training or deployment), human oversight, and interpretability (e.g. an independent 'translator'). ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 67 AVOIDING TAMPERING INCENTIVES IN DEEP RL VIA DECOUPLED APPROVAL attributed to: Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg posted by: KabirKumar How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the a... How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent? Standard RL algorithms assume a secure reward function, and can thus perform poorly in settings where agents can tamper with the reward-generating mechanism. We present a principled solution to the problem of learning from influenceable feedback, which combines approval with a decoupled feedback collection procedure. For a natural class of corruption functions, decoupled approval algorithms have aligned incentives both at convergence and for their local updates. Empirically, they also scale to complex 3D environments where tampering is possible. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 68 PESSIMISM ABOUT UNKNOWN UNKNOWNS INSPIRES CONSERVATISM attributed to: Michael K. Cohen, Marcus Hutter posted by: KabirKumar If we could define the set of all bad outcomes, we could hard-code an agent which avoids them; however, in suf... If we could define the set of all bad outcomes, we could hard-code an agent which avoids them; however, in sufficiently complex environments, this is infeasible. We do not know of any general-purpose approaches in the literature to avoiding novel failure modes. Motivated by this, we define an idealized Bayesian reinforcement learner which follows a policy that maximizes the worst-case expected reward over a set of world-models. We call this agent pessimistic, since it optimizes assuming the worst case. A scalar parameter tunes the agent's pessimism by changing the size of the set of world-models taken into account... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 69 PROVABLY FAIR FEDERATED LEARNING attributed to: Shengyuan Hu, Zhiwei Steven Wu, Virginia Smith posted by: KabirKumar In federated learning, fair prediction across various protected groups (e.g., gender, race) is an important co... In federated learning, fair prediction across various protected groups (e.g., gender, race) is an important constraint for many applications. Unfortunately, prior work studying group fair federated learning lacks formal convergence or fairness guaran- tees. Our work provides a new definition for group fairness in federated learning based on the notion of Bounded Group Loss (BGL), which can be easily applied to common federated learning objectives. Based on our definition, we propose a scalable algorithm that optimizes the empirical risk and global fairness constraints, which we evaluate across common fairness and federated learning benchmarks. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 70 TOWARDS SAFE ARTIFICIAL GENERAL INTELLIGENCE attributed to: Tom Everitt posted by: shumaari The field of artificial intelligence has recently experienced a number of breakthroughs thanks to progress in ... The field of artificial intelligence has recently experienced a number of breakthroughs thanks to progress in deep learning and reinforcement learning. Computer algorithms now outperform humans at Go, Jeopardy, image classification, and lip reading, and are becoming very competent at driving cars and interpreting natural language. The rapid development has led many to conjecture that artificial intelligence with greater-than-human ability on a wide range of tasks may not be far. This in turn raises concerns whether we know how to control such systems, in case we were to successfully build them... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 71 TRANSPARENCY, DETECTION AND IMITATION IN STRATEGIC CLASSIFICATION attributed to: Flavia Barsotti, Ruya Gokhan Kocer, Fernando P. Santos posted by: shumaari Given the ubiquity of AI-based decisions that affect individuals’ lives, providing transparent explanations ab... Given the ubiquity of AI-based decisions that affect individuals’ lives, providing transparent explanations about algorithms is ethically sound and often legally mandatory. How do individuals strategically adapt following explanations? What are the consequences of adaptation for algorithmic accuracy? We simulate the interplay between explanations shared by an Institution (e.g. a bank) and the dynamics of strategic adaptation by Individuals reacting to such feedback... Keywords: Agent-based and Multi-agent Systems: Agent-Based Simulation and Emergence; AI Ethics, Trust, Fairness: Ethical, Legal and Societal Issues; Multidisciplinary Topics and Applications: Finance ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 72 SOCIALLY INTELLIGENT GENETIC AGENTS FOR THE EMERGENCE OF EXPLICIT NORMS attributed to: Rishabh Agrawal, Nirav Ajmeri, Munindar Singh posted by: shumaari Norms help regulate a society. Norms may be explicit (represented in structured form) or implicit. We address ... Norms help regulate a society. Norms may be explicit (represented in structured form) or implicit. We address the emergence of explicit norms by developing agents who provide and reason about explanations for norm violations in deciding sanctions and identifying alternative norms. These agents use a genetic algorithm to produce norms and reinforcement learning to learn the values of these norms. We find that applying explanations leads to norms that provide better cohesion and goal satisfaction for the agents. Our results are stable for societies with differing attitudes of generosity. Keywords: Agent-based and Multi-agent Systems: Agent-Based Simulation and Emergence, Normative systems ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 73 TEACHING AI AGENTS ETHICAL VALUES USING REINFORCEMENT LEARNING AND POLICY ORCHESTRATION (EXTENDED ABSTRACT) attributed to: Noothigattu, Ritesh; Bouneffouf, Djallel; Mattei, Nicholas; Chandra, Rachita; Madan, Piyush; Varshney, Kush R.; Campbell, Murray; Singh, Moninder; and Rossi, Francesca posted by: JustinBradshaw Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in w... Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 74 INVERSE REINFORCEMENT LEARNING FROM LIKE-MINDED TEACHERS attributed to: Noothigattu, Ritesh; Yan, Tom; Procaccia, Ariel D. posted by: JustinBradshaw We study the problem of learning a policy in a Markov decision process (MDP) based on observations of the acti... We study the problem of learning a policy in a Markov decision process (MDP) based on observations of the actions taken by multiple teachers. We assume that the teachers are like-minded in that their reward functions -- while different from each other -- are random perturbations of an underlying reward function. Under this assumption, we demonstrate that inverse reinforcement learning algorithms that satisfy a certain property -- that of matching feature expectations -- yield policies that are approximately optimal with respect to the underlying reward function, and that no algorithm can do better in the worst case. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 75 INVERSE REINFORCEMENT LEARNING: A CONTROL LYAPUNOV APPROACH attributed to: Tesfazgi, Samuel; Lederer, Armin; and Hirche, Sandra posted by: JustinBradshaw Inferring the intent of an intelligent agent from demonstrations and subsequently predicting its behavior, is ... Inferring the intent of an intelligent agent from demonstrations and subsequently predicting its behavior, is a critical task in many collaborative settings. A common approach to solve this problem is the framework of inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to an intrinsic cost function that reflects its intent and informs its control actions. In this work, we reformulate the IRL inference problem to learning control Lyapunov functions (CLF) from demonstrations by exploiting the inverse optimality property, which states that every CLF is also a meaningful value function. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 76 A VOTING-BASED SYSTEM FOR ETHICAL DECISION MAKING attributed to: Noothigattu, Ritesh; Gaikwad, Snehalkumar ‘Neil’ S.; Awad, Edmond; Dsouza, Sohan; Rahwan, Iyad; Ravikumar, Pradeep; and Procaccia, Ariel D. posted by: JustinBradshaw We present a general approach to automating ethical decisions, drawing on machine learning and computational s... We present a general approach to automating ethical decisions, drawing on machine learning and computational social choice. In a nutshell, we propose to learn a model of societal preferences, and, when faced with a specific ethical dilemma at runtime, efficiently aggregate those preferences to identify a desirable choice. We provide a concrete algorithm that instantiates our approach; some of its crucial steps are informed by a new theory of swap-dominance efficient voting rules. Finally, we implement and evaluate a system for ethical decision making in the autonomous vehicle domain, using preference data collected from 1.3 million people through the Moral Machine website. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 77 ALIGNING SUPERHUMAN AI WITH HUMAN BEHAVIOR: CHESS AS A MODEL SYSTEM attributed to: McIlroy-Young, Reid; Sen, Siddhartha; Kleinberg, Jon; Anderson, Ashton posted by: JustinBradshaw As artificial intelligence becomes increasingly intelligent—in some cases, achieving superhuman performance—th... As artificial intelligence becomes increasingly intelligent—in some cases, achieving superhuman performance—there is growing potential for humans to learn from and collaborate with algorithms. However, the ways in which AI systems approach problems are often different from the ways people do, and thus may be uninterpretable and hard to learn from. A crucial step in bridging this gap between human and artificial intelligence is modeling the granular actions that constitute human behavior, rather than simply matching aggregate human performance. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 78 LEARNING TO PLAY NO-PRESS DIPLOMACY WITH BEST RESPONSE POLICY ITERATION attributed to: Anthony, Thomas; Eccles, Tom; Tacchetti, Andrea; Kramár, János; Gemp, Ian; Hudson, Thomas C.; Porcel, Nicolas; Lanctot, Marc; Pérolat, Julien; Everett, Richard; Werpachowski, Roman; Singh, Satinder; Graepel, Thore; Bachrach, Yoram posted by: JustinBradshaw Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-su... Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 79 TRUTHFUL AI: DEVELOPING AND GOVERNING AI THAT DOES NOT LIE attributed to: Owain Evans, Owen Cotton-Barratt, Lukas Finnveden, Adam Bales, Avital Balwit, Peter Wills, Luca Righetti, William Saunders posted by: JustinBradshaw In many contexts, lying – the use of verbal falsehoods to deceive – is harmful. While lying has traditionally ... In many contexts, lying – the use of verbal falsehoods to deceive – is harmful. While lying has traditionally been a human affair, AI systems that make sophisticated verbal statements are becoming increasingly prevalent. This raises the question of how we should limit the harm caused by AI “lies” (i.e. falsehoods that are actively selected for). Human truthfulness is governed by social norms and by laws (against defamation, perjury, and fraud). Differences between AI and humans present an opportunity to have more precise standards of truthfulness for AI, and to have these standards rise over time. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 80 VERIFIABLY SAFE EXPLORATION FOR END-TO-END REINFORCEMENT LEARNING attributed to: Nathan Hunt, Nathan Fulton, Sara Magliacane, Nghia Hoang, Subhro Das, Armando Solar-Lezama posted by: KabirKumar Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey har... Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 81 A ROADMAP FOR ROBUST END-TO-END ALIGNMENT attributed to: Lê Nguyên Hoang posted by: KabirKumar As algorithms are becoming more and more data-driven, the greatest lever we have left to make them robustly be... As algorithms are becoming more and more data-driven, the greatest lever we have left to make them robustly beneficial to mankind lies in the design of their objective functions. Robust alignment aims to address this design problem. Arguably, the growing importance of social medias’ recommender systems makes it an urgent problem, for instance to ade-quately automate hate speech moderation. In this paper, we propose a preliminary research program for robust alignment. This roadmap aims at decomposing the end-to-end alignment problem into numerous more tractable subproblems... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 82 SAFE REINFORCEMENT LEARNING WITH NATURAL LANGUAGE CONSTRAINTS attributed to: Tsung-Yen Yang, Michael Hu, Yinlam Chow, Peter J. Ramadge, Karthik Narasimhan posted by: KabirKumar While safe reinforcement learning (RL) holds great promise for many practical applications like robotics or au... While safe reinforcement learning (RL) holds great promise for many practical applications like robotics or autonomous cars, current approaches require specifying constraints in mathematical form. Such specifications demand domain expertise, limiting the adoption of safe RL. In this paper, we propose learning to interpret natural language constraints for safe RL. To this end, we first introduce HazardWorld, a new multi-task benchmark that requires an agent to optimize reward while not violating constraints specified in free-form text. We then develop an agent with a modular architecture that can interpret and adhere to such textual constraints while learning new tasks. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 83 TAKING PRINCIPLES SERIOUSLY: A HYBRID APPROACH TO VALUE ALIGNMENT attributed to: Tae Wan Kim, John Hooker, Thomas Donaldson (Carnegie Mellon University, USA University of Pennsylvania, USA) posted by: KabirKumar An important step in the development of value alignment (VA) systems in AI is understanding how VA can reflect... An important step in the development of value alignment (VA) systems in AI is understanding how VA can reflect valid ethical principles. We propose that designers of VA systems incorporate ethics by utilizing a hybrid approach in which both ethical reasoning and empirical observation play a role. This, we argue, avoids committing the "naturalistic fallacy," which is an attempt to derive "ought" from "is," and it provides a more adequate form of ethical reasoning when the fallacy is not committed... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 84 FULLY GENERAL ONLINE IMITATION LEARNING attributed to: Michael K. Cohen, Marcus Hutter, Neel Nanda posted by: KabirKumar In imitation learning, imitators and demonstrators are policies for picking actions given past interactions wi... In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. In general, one mistake during learning can lead to completely different events. In the special setting of environments that restart, existing work provides formal guidance in how to imitate so that events unfold similarly, but outside that setting, no formal guidance exists... Keywords: Bayesian Sequence Prediction, Imitation Learning, Active Learning, General Environments ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 85 ACCUMULATING RISK CAPITAL THROUGH INVESTING IN COOPERATION attributed to: Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell posted by: KabirKumar Recent work on promoting cooperation in multi-agent learning has resulted in many methods which successfully p... Recent work on promoting cooperation in multi-agent learning has resulted in many methods which successfully promote cooperation at the cost of becoming more vulnerable to exploitation by malicious actors. We show that this is an unavoidable trade-off and propose an objective which balances these concerns, promoting both safety and long-term cooperation. Moreover, the trade-off between safety and cooperation is not severe, and you can receive exponentially large returns through cooperation from a small amount of risk... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 86 NORMATIVE DISAGREEMENT AS A CHALLENGE FOR COOPERATIVE AI attributed to: Bingchen Zhao, Shaozuo Yu, Wufei Ma, Mingxin Yu, Shenxiao Mei, Angtian Wang, Ju He, Alan Yuille, Adam Kortylewski posted by: KabirKumar Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that exist... Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking models for image classification, object detection, and 3D pose estimation... (Full Abstract in Full Plan- click Title to View) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 87 IDENTIFYING ADVERSARIAL ATTACKS ON TEXT CLASSIFIERS attributed to: Zhouhang Xie, Jonathan Brophy, Adam Noack, Wencong You, Kalyani Asthana, Carter Perkins, Sabrina Reis, Sameer Singh, Daniel Lowd posted by: KabirKumar The landscape of adversarial attacks against text classifiers continues to grow, with new attacks developed ev... The landscape of adversarial attacks against text classifiers continues to grow, with new attacks developed every year and many of them available in standard toolkits, such as TextAttack and OpenAttack. In response, there is a growing body of work on robust learning, which reduces vulnerability to these attacks, though sometimes at a high cost in compute time or accuracy. In this paper, we take an alternate approach -- we attempt to understand the attacker by analyzing adversarial text to determine which methods were used to create it... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 88 TRAINING LANGUAGE MODELS TO FOLLOW INSTRUCTIONS WITH HUMAN FEEDBACK attributed to: OpenAI (Full Author list in Full Plan- click title to view) posted by: KabirKumar Making language models bigger does not inherently make them better at following a user's intent. For example, ... Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 89 SAFE REINFORCEMENT LEARNING BY IMAGINING THE NEAR FUTURE attributed to: Garrett Thomas, Yuping Luo, Tengyu Ma posted by: KabirKumar Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-worl... Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-world problems, where suboptimal behaviors may lead to actual negative consequences. In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future. In this setting, a model-based agent with a sufficiently accurate model can avoid unsafe states. We devise a model-based algorithm that heavily penalizes unsafe trajectories, and derive guarantees that our algorithm can avoid unsafe states under certain assumptions... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 90 RED TEAMING LANGUAGE MODELS WITH LANGUAGE MODELS attributed to: Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, Geoffrey Irving posted by: KabirKumar Language Models (LMs) often cannot be deployed because of their potential to harm users in hard-to-predict way... Language Models (LMs) often cannot be deployed because of their potential to harm users in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using human annotators to hand-write test cases. However, human annotation is expensive, limiting the number and diversity of test cases. In this work, we automatically find cases where a target LM behaves in a harmful way, by generating test cases ("red teaming") using another LM... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 91 'INDIFFERENCE' METHODS FOR MANAGING AGENT REWARDS attributed to: Stuart Armstrong, Xavier O'Rourke posted by: KabirKumar `Indifference' refers to a class of methods used to control reward based agents. Indifference techniques aim t... `Indifference' refers to a class of methods used to control reward based agents. Indifference techniques aim to achieve one or more of three distinct goals: rewards dependent on certain events (without the agent being motivated to manipulate the probability of those events), effective disbelief (where agents behave as if particular events could never happen), and seamless transition from one reward function to another (with the agent acting as if this change is unanticipated). This paper presents several methods for achieving these goals in the POMDP setting, establishing their uses, strengths, and requirements... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 92 A PSYCHOPATHOLOGICAL APPROACH TO SAFETY ENGINEERING IN AI AND AGI attributed to: Vahid Behzadan, Arslan Munir, Roman V. Yampolskiy posted by: KabirKumar The complexity of dynamics in AI techniques is already approaching that of complex adaptive systems, thus curt... The complexity of dynamics in AI techniques is already approaching that of complex adaptive systems, thus curtailing the feasibility of formal controllability and reachability analysis in the context of AI safety. It follows that the envisioned instances of Artificial General Intelligence (AGI) will also suffer from challenges of complexity. To tackle such issues, we propose the modeling of deleterious behaviors in AI and AGI as psychological disorders, thereby enabling the employment of psychopathological approaches to analysis and control of misbehaviors... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 93 OVERSIGHT OF UNSAFE SYSTEMS VIA DYNAMIC SAFETY ENVELOPES attributed to: David Manheim posted by: KabirKumar This paper reviews the reasons that Human-in-the-Loop is both critical for preventing widely-understood failur... This paper reviews the reasons that Human-in-the-Loop is both critical for preventing widely-understood failure modes for machine learning, and not a practical solution. Following this, we review two current heuristic methods for addressing this. The first is provable safety envelopes, which are possible only when the dynamics of the system are fully known, but can be useful safety guarantees when optimal behavior is based on machine learning with poorly-understood safety characteristics... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 94 ACTIVE INVERSE REWARD DESIGN attributed to: Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell posted by: KabirKumar Designers of AI agents often iterate on the reward function in a trial-and-error process until they get the de... Designers of AI agents often iterate on the reward function in a trial-and-error process until they get the desired behavior, but this only guarantees good behavior in the training environment. We propose structuring this process as a series of queries asking the user to compare between different reward functions. Thus we can actively select queries for maximum informativeness about the true reward. In contrast to approaches asking the designer for optimal behavior, this allows us to gather additional information by eliciting preferences between suboptimal behaviors... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 95 RISK-SENSITIVE GENERATIVE ADVERSARIAL IMITATION LEARNING attributed to: Jonathan Lacotte, Mohammad Ghavamzadeh, Yinlam Chow, Marco Pavone posted by: KabirKumar We study risk-sensitive imitation learning where the agent's goal is to perform at least as well as the expert... We study risk-sensitive imitation learning where the agent's goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approach to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk-sensitive GAIL (RS-GAIL). We then derive two different versions of our RS-GAIL optimization problem that aim at matching the risk profiles of the agent and the expert w.r.t. ... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 96 ALIGNING AI WITH SHARED HUMAN VALUES attributed to: Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt posted by: KabirKumar We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS data... We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to steer chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete ability to predict basic human ethical judgements... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 97 AVOIDING SIDE EFFECTS BY CONSIDERING FUTURE TASKS attributed to: Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Maxwell Forbes, Jon Borchardt, Jenny Liang, Oren Etzioni, Maarten Sap, Yejin Choi posted by: KabirKumar Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the... Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task...(Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 98 MEASURING AND AVOIDING SIDE EFFECTS USING RELATIVE REACHABILITY attributed to: Victoria Krakovna, Laurent Orseau, Miljan Martic, Shane Legg posted by: KabirKumar How can we design reinforcement learning agents that avoid causing unnecessary disruptions to their environmen... How can we design reinforcement learning agents that avoid causing unnecessary disruptions to their environment? We argue that current approaches to penalizing side effects can introduce bad incentives in tasks that require irreversible actions, and in environments that contain sources of change other than the agent. For example, some approaches give the agent an incentive to prevent any irreversible changes in the environment, including the actions of other agents. We introduce a general definition of side effects, based on relative reachability of states compared to a default state, that avoids these undesirable incentives...(Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 99 CERTIFIABLE ROBUSTNESS TO ADVERSARIAL STATE UNCERTAINTY IN DEEP REINFORCEMENT LEARNING attributed to: Michael Everett, Bjorn Lutjens, Jonathan P. How posted by: KabirKumar Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application i... Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was recently shown to cause an autonomous vehicle to swerve into another lane. In light of these dangers, numerous algorithms have been developed as defensive mechanisms from these adversarial inputs, some of which provide formal robustness guarantees or certificates... {Full Abstract in Full Plan- click plan title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 100 LEARNING HUMAN OBJECTIVES BY EVALUATING HYPOTHETICAL BEHAVIOR attributed to: Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike posted by: KabirKumar We seek to align agent behavior with a user's objectives in a reinforcement learning setting with unknown dyna... We seek to align agent behavior with a user's objectives in a reinforcement learning setting with unknown dynamics, an unknown reward function, and unknown unsafe states. The user knows the rewards and unsafe states, but querying the user is expensive. To address this challenge, we propose an algorithm that safely and interactively learns a model of the user's reward function. We start with a generative model of initial states and a forward dynamics model trained on off-policy data... (Full Abstract in Full Plan- click plan title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 101 SAFELIFE 1.0: EXPLORING SIDE EFFECTS IN COMPLEX ENVIRONMENTS attributed to: Carroll L. Wainwright, Peter Eckersley posted by: KabirKumar We present SafeLife, a publicly available reinforcement learning environment that tests the safety of reinforc... We present SafeLife, a publicly available reinforcement learning environment that tests the safety of reinforcement learning agents. It contains complex, dynamic, tunable, procedurally generated levels with many opportunities for unsafe behavior. Agents are graded both on their ability to maximize their explicit reward and on their ability to operate safely without unnecessary side effects. We train agents to maximize rewards using proximal policy optimization and score them on a suite of benchmark levels... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 102 (WHEN) IS TRUTH-TELLING FAVORED IN AI DEBATE? attributed to: Vojtěch Kovařík(Future of Humanity Institute University of Oxford), Ryan Carey (Artificial Intelligence Center Czech Technical University) posted by: KabirKumar For some problems, humans may not be able to accurately judge the goodness of AI-proposed solutions. Irving et... For some problems, humans may not be able to accurately judge the goodness of AI-proposed solutions. Irving et al. (2018) propose that in such cases, we may use a debate between two AI systems to amplify the problem-solving capabilities of a human judge. We introduce a mathematical framework that can model debates of this type and propose that the quality of debate designs should be measured by the accuracy of the most persuasive answer. We describe a simple instance of the debate framework called feature debate and analyze the degree to which such debates track the truth... (full abstract in full plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 103 POSITIVE-UNLABELED REWARD LEARNING attributed to: Danfei Xu(Stanford), Misha Denil(DeepMind) posted by: KabirKumar Learning reward functions from data is a promising path towards achieving scalable Reinforcement Learning (RL)... Learning reward functions from data is a promising path towards achieving scalable Reinforcement Learning (RL) for robotics. However, a major challenge in training agents from learned reward models is that the agent can learn to exploit errors in the reward model to achieve high reward behaviors that do not correspond to the intended task. These reward delusions can lead to unintended and even dangerous behaviors...(full abstract in full plan) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 104 ON THE FEASIBILITY OF LEARNING, RATHER THAN ASSUMING, HUMAN BIASES FOR REWARD INFERENCE attributed to: Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan posted by: KabirKumar Our goal is for agents to optimize the right reward function, despite how difficult it is for us to specify wh... Our goal is for agents to optimize the right reward function, despite how difficult it is for us to specify what that is. Inverse Reinforcement Learning (IRL) enables us to infer reward functions from demonstrations, but it usually assumes that the expert is noisily optimal. Real people, on the other hand, often have systematic biases: risk-aversion, myopia, etc. One option is to try to characterize these biases and account for them explicitly during learning... (Full abstract in plan- click title to view} ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 105 HUMAN-CENTERED ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING attributed to: Mark O. Riedl (School of Interactive Computing Georgia Institute of Technology) posted by: KabirKumar Humans are increasingly coming into contact with artificial intelligence and machine learning systems. Human-c... Humans are increasingly coming into contact with artificial intelligence and machine learning systems. Human-centered artificial intelligence is a perspective on AI and ML that algorithms must be designed with awareness that they are part of a larger system consisting of humans. We lay forth an argument that human-centered artificial intelligence can be broken down into two aspects: (1) AI systems that understand humans from a sociocultural perspective, and (2) AI systems that help humans understand them. We further argue that issues of social responsibility such as fairness, accountability, interpretability, and transparency. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 106 SCALING SHARED MODEL GOVERNANCE VIA MODEL SPLITTING attributed to: Miljan Martic, Jan Leike, Andrew Trask, Matteo Hessel, Shane Legg, Pushmeet Kohli (DeepMind) posted by: KabirKumar Currently the only techniques for sharing governance of a deep learning model are homomorphic encryption and s... Currently the only techniques for sharing governance of a deep learning model are homomorphic encryption and secure multiparty computation. Unfortunately, neither of these techniques is applicable to the training of large neural networks due to their large computational and communication overheads. As a scalable technique for shared model governance, we propose splitting deep learning model between multiple parties... (Full abstract in plan- click title to view} ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 107 BUILDING ETHICALLY BOUNDED AI attributed to: Francesca Rossi, Nicholas Mattei (IBM) posted by: KabirKumar The more AI agents are deployed in scenarios with possibly unexpected situations, the more they need to be fle... The more AI agents are deployed in scenarios with possibly unexpected situations, the more they need to be flexible, adaptive, and creative in achieving the goal we have given them. Thus, a certain level of freedom to choose the best path to the goal is inherent in making AI robust and flexible enough. At the same time, however, the pervasive deployment of AI in our life, whether AI is autonomous or collaborating with humans, raises several ethical challenges. AI agents should be aware and follow appropriate ethical principles and should thus exhibit properties such as fairness or other virtues... (Full abstract in plan- click title to view} ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 108 GUIDING POLICIES WITH LANGUAGE VIA META-LEARNING attributed to: John D. Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, Jacob Andreas, John DeNero, Pieter Abbeel, Sergey Levine posted by: KabirKumar Behavioral skills or policies for autonomous agents are conventionally learned from reward functions, via rein... Behavioral skills or policies for autonomous agents are conventionally learned from reward functions, via reinforcement learning, or from demonstrations, via imitation learning. However, both modes of task specification have their disadvantages: reward functions require manual engineering, while demonstrations require a human expert to be able to actually perform the task in order to generate the demonstration... (Full abstract in plan- click title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 109 UNDERSTANDING AGENT INCENTIVES USING CAUSAL INFLUENCE DIAGRAMS. PART I: SINGLE ACTION SETTINGS attributed to: Tom Everitt, Pedro A. Ortega, Elizabeth Barnes, Shane Legg posted by: KabirKumar Agents are systems that optimize an objective function in an environment. Together, the goal and the environme... Agents are systems that optimize an objective function in an environment. Together, the goal and the environment induce secondary objectives, incentives. Modeling the agent-environment interaction using causal influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes can the agent have an incentivize to observe, and (2) which nodes can the agent have an incentivize to control? The answers tell us which information and influence points need extra protection... (Full Abstract in Full Plan- click plan title to view) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 110 INTEGRATIVE BIOLOGICAL SIMULATION, NEUROPSYCHOLOGY, AND AI SAFETY attributed to: Gopal P. Sarma, Adam Safron, Nick J. Hay posted by: KabirKumar We describe a biologically-inspired research agenda with parallel tracks aimed at AI and AI safety. The bottom... We describe a biologically-inspired research agenda with parallel tracks aimed at AI and AI safety. The bottom-up component consists of building a sequence of biophysically realistic simulations of simple organisms such as the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and the zebrafish Danio rerio to serve as platforms for research into AI algorithms and system architectures. The top-down component consists of an approach to value alignment that grounds AI goal structures in neuropsychology, broadly considered...(full abstract in full plan) ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 111 CONSTITUTIONAL AI: HARMLESSNESS FROM AI FEEDBACK attributed to: Anthropic (full author list in full plan) posted by: KabirKumar As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment wi... As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 112 WHAT WOULD JIMINY CRICKET DO? TOWARDS AGENTS THAT BEHAVE MORALLY attributed to: Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt posted by: KabirKumar When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong. B... When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong. By contrast, artificial agents are currently not endowed with a moral sense. As a consequence, they may learn to behave immorally when trained on environments that ignore moral concerns, such as violent video games. With the advent of generally capable agents that pretrain on many environments, it will become necessary to mitigate inherited biases from environments that teach immoral behavior. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 113 TRUTHFUL AI: DEVELOPING AND GOVERNING AI THAT DOES NOT LIE attributed to: Owain Evans, Owen Cotton-Barratt, Lukas Finnveden, Adam Bales, Avital Balwit, Peter Wills, Luca Righetti, William Saunders posted by: KabirKumar In many contexts, lying -- the use of verbal falsehoods to deceive -- is harmful. While lying has traditionall... In many contexts, lying -- the use of verbal falsehoods to deceive -- is harmful. While lying has traditionally been a human affair, AI systems that make sophisticated verbal statements are becoming increasingly prevalent. This raises the question of how we should limit the harm caused by AI "lies" (i.e. falsehoods that are actively selected for). Human truthfulness is governed by social norms and by laws (against defamation, perjury, and fraud). Differences between AI and humans present an opportunity to have more precise standards of truthfulness for AI, and to have these standards rise over time. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 114 SAFE REINFORCEMENT LEARNING WITH DEAD-ENDS AVOIDANCE AND RECOVERY attributed to: Xiao Zhang, Hai Zhang, Hongtu Zhou, Chang Huang, Di Zhang, Chen Ye*, Junqiao Zhao*, Member, IEEE, posted by: KabirKumar Safety is one of the main challenges in apply- ing reinforcement learning to realistic environmental tasks. To... Safety is one of the main challenges in apply- ing reinforcement learning to realistic environmental tasks. To ensure safety during and after training process, existing methods tend to adopt overly conservative policy to avoid unsafe situations. However, overly conservative policy severely hinders the exploration, and makes the algorithms substantially less rewarding. In this paper, we propose a method to construct a boundary that discriminates safe and unsafe states. The boundary we construct is equivalent to distinguishing dead-end states, indicating the maximum extent to which safe exploration is guaranteed, and thus has minimum limitation on exploration... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 115 LEARNING UNDER MISSPECIFIED OBJECTIVE SPACES attributed to: Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Anca D. Dragan posted by: KabirKumar Learning robot objective functions from human input has become increasingly important, but state-of-the-art te... Learning robot objective functions from human input has become increasingly important, but state-of-the-art techniques assume that the human's desired objective lies within the robot's hypothesis space. When this is not true, even methods that keep track of uncertainty over the objective fail because they reason about which hypothesis might be correct, and not whether any of the hypotheses are correct. We focus specifically on learning from physical human corrections during the robot's task execution, where not having a rich enough hypothesis space leads to the robot updating its objective in ways that the person did not actually intend... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 116 INTERPRETABLE MULTI-OBJECTIVE REINFORCEMENT LEARNING THROUGH POLICY ORCHESTRATION attributed to: Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, Rachita Chandra, Piyush Madan, Kush Varshney, Murray Campbell, Moninder Singh, Francesca Rossi posted by: KabirKumar Autonomous cyber-physical agents and systems play an increasingly large role in our lives. To ensure that agen... Autonomous cyber-physical agents and systems play an increasingly large role in our lives. To ensure that agents behave in ways aligned with the values of the societies in which they operate, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. These constraints and norms can come from any number of sources including regulations, business process guidelines, laws, ethical principles, social norms, and moral values. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 117 CM3: COOPERATIVE MULTI-GOAL MULTI-STAGE MULTI-AGENT REINFORCEMENT LEARNING attributed to: Jiachen Yang, Alireza Nakhaei, David Isele, Kikuo Fujimura, Hongyuan Zha posted by: KabirKumar A variety of cooperative multi-agent control problems require agents to achieve individual goals while contrib... A variety of cooperative multi-agent control problems require agents to achieve individual goals while contributing to collective success. This multi-goal multi-agent setting poses difficulties for recent algorithms, which primarily target settings with a single global reward, due to two new challenges: efficient exploration for learning both individual goal attainment and cooperation for others' success, and credit-assignment for interactions between actions and goals of different agents... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 118 IMITATING LATENT POLICIES FROM OBSERVATION attributed to: Ashley D. Edwards, Himanshu Sahni, Yannick Schroecker, Charles L. Isbell posted by: KabirKumar In this paper, we describe a novel approach to imitation learning that infers latent policies directly from st... In this paper, we describe a novel approach to imitation learning that infers latent policies directly from state observations. We introduce a method that characterizes the causal effects of latent actions on observations while simultaneously predicting their likelihood. We then outline an action alignment procedure that leverages a small amount of environment interactions to determine a mapping between the latent and real-world actions. We show that this corrected labeling can be used for imitating the observed behavior, even though no expert actions are given. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 119 EMBEDDED AGENCY attributed to: Abram Demski, Scott Garrabrant posted by: KabirKumar Traditional models of rational action treat the agent as though it is cleanly separated from its environment, ... Traditional models of rational action treat the agent as though it is cleanly separated from its environment, and can act on that environment from the outside. Such agents have a known functional relationship with their environment, can model their environment in every detail, and do not need to reason about themselves or their internal parts. We provide an informal survey of obstacles to formalizing good reasoning for agents embedded in their environment. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 120 REWARD TAMPERING PROBLEMS AND SOLUTIONS IN REINFORCEMENT LEARNING: A CAUSAL INFLUENCE DIAGRAM PERSPECTIVE attributed to: Tom Everitt, Marcus Hutter, Ramana Kumar, Victoria Krakovna posted by: KabirKumar Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficientl... Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principles that prevent instrumental goals for two different types of reward tampering (reward function tampering and RF-input tampering). ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 121 PROVABLY SAFE ARTIFICIAL GENERAL INTELLIGENCE VIA INTERACTIVE PROOFS attributed to: Kristen Carlson posted by: KabirKumar Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is ... Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn << AGIn+1). No proof exists that AGI will benefit h umans o r o f a s ound v alue-alignment m ethod. N umerous p aths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 122 SAFE ARTIFICIAL GENERAL INTELLIGENCE VIA DISTRIBUTED LEDGER TECHNOLOGY attributed to: Kristen W. Carlson posted by: KabirKumar I propose a set of logically distinct conceptual components that are necessary and sufficient to 1) ensure tha... I propose a set of logically distinct conceptual components that are necessary and sufficient to 1) ensure that most known AGI scenarios will not harm humanity and 2) robustly align AGI values and goals with human values. Methods. By systematically addressing each pathway category to malevolent AI we can induce the methods/axioms required to redress the category. Results and Discussion. Distributed ledger technology (DLT, blockchain) is integral to this proposal ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 123 THE INCENTIVES THAT SHAPE BEHAVIOUR attributed to: Ryan Carey, Eric Langlois, Tom Everitt, Shane Legg posted by: KabirKumar Which variables does an agent have an incentive to control with its decision, and which variables does it have... Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to? We formalise these incentives, and demonstrate unique graphical criteria for detecting them in any single decision causal influence diagram. To this end, we introduce structural causal influence models, a hybrid of the influence diagram and structural causal model frameworks. Finally, we illustrate how these incentives predict agent incentives in both fairness and AI safety applications. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 124 LEGIBLE NORMATIVITY FOR AI ALIGNMENT: THE VALUE OF SILLY RULES attributed to: Dylan Hadfield-Menell, McKane Andrus, Gillian K. Hadfield posted by: KabirKumar It has become commonplace to assert that autonomous agents will have to be built to follow human rules of beha... It has become commonplace to assert that autonomous agents will have to be built to follow human rules of behavior--social norms and laws. But human laws and norms are complex and culturally varied systems, in many cases agents will have to learn the rules. This requires autonomous agents to have models of how human rule systems work so that they can make reliable predictions about rules. In this paper we contribute to the building of such models by analyzing an overlooked distinction between important rules and what we call silly rules--rules with no discernible direct impact on welfare. We show that silly rules render... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 125 ADAPTIVE MECHANISM DESIGN: LEARNING TO PROMOTE COOPERATION attributed to: Tobias Baumann, Thore Graepel, John Shawe-Taylor posted by: KabirKumar In the future, artificial learning agents are likely to become increasingly widespread in our society. They wi... In the future, artificial learning agents are likely to become increasingly widespread in our society. They will interact with both other learning agents and humans in a variety of complex settings including social dilemmas. We consider the problem of how an external agent can promote cooperation between artificial learners by distributing additional rewards and punishments based on observing the learners' actions. We propose a rule for automatically learning how to create right incentives by considering the players' anticipated parameter updates. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 126 AGENT-AGNOSTIC HUMAN-IN-THE-LOOP REINFORCEMENT LEARNING attributed to: David Abel, John Salvatier, Andreas Stuhlmüller, Owain Evans posted by: KabirKumar Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learnin... Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. Prior work has developed teaching protocols that enable agents to learn efficiently in complex environments; many of these methods tailor the teacher's guidance to agents with a particular representation or underlying learning scheme, offering effective but specialized teaching procedures. In this work, we explore protocol programs, an agent-agnostic schema for Human-in-the-Loop Reinforcement Learning. Our goal is to incorporate the beneficial properties of a human teacher into Reinforcement Learning without making strong assumptions about the inner workings of the agent. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 127 TOWARD TRUSTWORTHY AI DEVELOPMENT: MECHANISMS FOR SUPPORTING VERIFIABLE CLAIMS attributed to: Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke and many more posted by: KabirKumar With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-sca... With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 128 INSTITUTIONALISING ETHICS IN AI THROUGH BROADER IMPACT REQUIREMENTS attributed to: Carina Prunkl, Carolyn Ashurst, Markus Anderljung, Helena Webb, Jan Leike, Allan Dafoe posted by: KabirKumar Turning principles into practice is one of the most pressing challenges of artificial intelligence (AI) govern... Turning principles into practice is one of the most pressing challenges of artificial intelligence (AI) governance. In this article, we reflect on a novel governance initiative by one of the world's largest AI conferences. In 2020, the Conference on Neural Information Processing Systems (NeurIPS) introduced a requirement for submitting authors to include a statement on the broader societal impacts of their research. Drawing insights from similar governance initiatives, including institutional review boards (IRBs) and impact requirements for funding applications, we investigate the risks, challenges and potential benefits of such an initiative... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 129 MODELING FRIENDS AND FOES attributed to: Pedro A. Ortega, Shane Legg posted by: KabirKumar How can one detect friendly and adversarial behavior from raw data? Detecting whether an environment is a frie... How can one detect friendly and adversarial behavior from raw data? Detecting whether an environment is a friend, a foe, or anything in between, remains a poorly understood yet desirable ability for safe and robust agents. This paper proposes a definition of these environmental "attitudes" based on an characterization of the environment's ability to react to the agent's private strategy. We define an objective function for a one-shot game that allows deriving the environment's probability distribution under friendly and adversarial assumptions alongside the agent's optimal strategy... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 130 SELF-IMITATION LEARNING attributed to: Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee posted by: KabirKumar This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to r... This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 131 DIRECTED POLICY GRADIENT FOR SAFE REINFORCEMENT LEARNING WITH HUMAN ADVICE attributed to: Hélène Plisnier, Denis Steckelmacher, Tim Brys, Diederik M. Roijers, Ann Nowé posted by: KabirKumar Many currently deployed Reinforcement Learning agents work in an environment shared with humans, be them co-wo... Many currently deployed Reinforcement Learning agents work in an environment shared with humans, be them co-workers, users or clients. It is desirable that these agents adjust to people's preferences, learn faster thanks to their help, and act safely around them. We argue that most current approaches that learn from human feedback are unsafe: rewarding or punishing the agent a-posteriori cannot immediately prevent it from wrong-doing. In this paper, we extend Policy Gradient to make it robust to external directives, that would otherwise break the fundamentally on-policy nature of Policy Gradient. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 132 SAFE REINFORCEMENT LEARNING VIA PROBABILISTIC SHIELDS attributed to: Nils Jansen, Bettina Könighofer, Sebastian Junges, Alexandru C. Serban, Roderick Bloem posted by: KabirKumar This paper targets the efficient construction of a safety shield for decision making in scenarios that incorpo... This paper targets the efficient construction of a safety shield for decision making in scenarios that incorporate uncertainty. Markov decision processes (MDPs) are prominent models to capture such planning problems. Reinforcement learning (RL) is a machine learning technique to determine near-optimal policies in MDPs that may be unknown prior to exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables decision-making to adhere to safety constraints with high probability. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 133 AN EFFICIENT, GENERALIZED BELLMAN UPDATE FOR COOPERATIVE INVERSE REINFORCEMENT LEARNING attributed to: Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan posted by: KabirKumar Our goal is for AI systems to correctly identify and act according to their human user's objectives. Cooperati... Our goal is for AI systems to correctly identify and act according to their human user's objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 134 SIMPLIFYING REWARD DESIGN THROUGH DIVIDE-AND-CONQUER attributed to: Ellis Ratner, Dylan Hadfield-Menell, Anca D. Dragan posted by: KabirKumar Designing a good reward function is essential to robot planning and reinforcement learning, but it can also be... Designing a good reward function is essential to robot planning and reinforcement learning, but it can also be challenging and frustrating. The reward needs to work across multiple different environments, and that often requires many iterations of tuning. We introduce a novel divide-and-conquer approach that enables the designer to specify a reward separately for each environment. By treating these separate reward functions as observations about the underlying true reward, we derive an approach to infer a common reward across all environments. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 135 INCOMPLETE CONTRACTING AND AI ALIGNMENT attributed to: Dylan Hadfield-Menell, Gillian Hadfield posted by: KabirKumar We suggest that the analysis of incomplete contracting developed by law and economics researchers can provide ... We suggest that the analysis of incomplete contracting developed by law and economics researchers can provide a useful framework for understanding the AI alignment problem and help to generate a systematic approach to finding solutions. We first provide an overview of the incomplete contracting literature and explore parallels between this work and the problem of AI alignment. As we emphasize, misalignment between principal and agent is a core focus of economic analysis. We highlight some technical results from the economics literature on incomplete contracts that may provide insights for AI alignment researchers. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 136 AI SAFETY AND REPRODUCIBILITY: ESTABLISHING ROBUST FOUNDATIONS FOR THE NEUROPSYCHOLOGY OF HUMAN VALUES attributed to: Gopal P. Sarma, Nick J. Hay, Adam Safron posted by: KabirKumar We propose the creation of a systematic effort to identify and replicate key findings in neuropsychology and a... We propose the creation of a systematic effort to identify and replicate key findings in neuropsychology and allied fields related to understanding human values. Our aim is to ensure that research underpinning the value alignment problem of artificial intelligence has been sufficiently validated to play a role in the design of AI systems. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 137 EMERGENT COORDINATION THROUGH GAME-INDUCED NONLINEAR OPINION DYNAMICS attributed to: Haimin Hu, Kensuke Nakamura, Kai-Chieh Hsu, Naomi Ehrich Leonard, Jaime Fernández Fisac posted by: KabirKumar We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose in... We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 138 ISAACS: ITERATIVE SOFT ADVERSARIAL ACTOR-CRITIC FOR SAFETY attributed to: Kai-Chieh Hsu, Duy Phuong Nguyen, Jaime Fernández Fisac posted by: KabirKumar The deployment of robots in uncontrolled environments requires them to operate robustly under previously unsee... The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 139 CHAIN OF HINDSIGHT ALIGNS LANGUAGE MODELS WITH FEEDBACK attributed to: Hao Liu, Carmelo Sferrazza, Pieter Abbeel posted by: KabirKumar Learning from human preferences is important for language models to be helpful and useful for humans, and to a... Learning from human preferences is important for language models to be helpful and useful for humans, and to align with human and social values. Prior work have achieved remarkable successes by learning from human feedback to understand and follow instructions. Nonetheless, these methods are either founded on hand-picked model generations that are favored by human annotators, rendering them ineffective in terms of data utilization and challenging to apply in general, or they depend on reward functions and reinforcement learning, which are prone to imperfect reward function and extremely challenging to optimize... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 140 THE WISDOM OF HINDSIGHT MAKES LANGUAGE MODELS BETTER INSTRUCTION FOLLOWERS attributed to: Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez posted by: KabirKumar Reinforcement learning has seen wide success in finetuning large language models to better align with instruct... Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. The so-called algorithm, Reinforcement Learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the underlying Reinforcement Learning (RL) algorithm is complex and requires an additional training pipeline for reward and value networks. In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 141 WHO NEEDS TO KNOW? MINIMAL KNOWLEDGE FOR OPTIMAL COORDINATION attributed to: Niklas Lauffer, Ameesh Shah, Micah Carroll, Michael Dennis, Stuart Russell posted by: KabirKumar To optimally coordinate with others in cooperative games, it is often crucial to have information about one's ... To optimally coordinate with others in cooperative games, it is often crucial to have information about one's collaborators: successful driving requires understanding which side of the road to drive on. However, not every feature of collaborators is strategically relevant: the fine-grained acceleration of drivers may be ignored while maintaining optimal coordination. We show that there is a well-defined dichotomy between strategically relevant and irrelevant information. Moreover, we show that, in dynamic games, this dichotomy has a compact representation that can be efficiently computed via a Bellman backup operator... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 142 ACTIVE REWARD LEARNING FROM MULTIPLE TEACHERS attributed to: Peter Barnett, Rachel Freedman, Justin Svegliato, Stuart Russell, Center for Human-Compatible AI, University of California, Berkeley,CA 94720, USA posted by: KabirKumar Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an A... Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system. This human feedback is often a preference comparison, in which the human teacher compares several samples of AI behavior and chooses which they believe best accomplishes the objective. While reward learning typically assumes that all feedback comes from a single teacher, in practice these systems often query multiple teachers to gather sufficient training data. In this paper, we investigate this disparity, and find that algorithmic evaluation of these different sources of feedback facilitates more accurate and efficient reward learning.... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 143 COOPERATIVE INVERSE REINFORCEMENT LEARNING attributed to: Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell posted by: KabirKumar For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its value... For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, partial-information game with two agents, human and robot; both are rewarded according to the human's reward function, but the robot does not initially know what this is... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 144 ALIGNMENT FOR ADVANCED MACHINE LEARNING SYSTEMS attributed to: Jessica Taylor and Eliezer Yudkowsky and Patrick LaVictoire and Andrew Critch Machine Intelligence Research Institute posted by: KabirKumar We survey eight research areas organized around one question: As learning systems become increasingly intellig... We survey eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? We focus on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions, and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective function does not line up perfectly with the intentions of the designers... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 145 SHORTEST AND NOT THE STEEPEST PATH WILL FIX THE INNER-ALIGNMENT PROBLEM attributed to: Thane Ruthenis (https://www.alignmentforum.org/users/thane-ruthenis?from=post_header) posted by: KabirKumar Replacing the 'stochastic gradient descent' SGD) with something that takes the shortest and not the steepest p... Replacing the 'stochastic gradient descent' SGD) with something that takes the shortest and not the steepest path should just about fix the whole inner-alignment problem ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 146 AI SAFETY VIA DEBATE attributed to: Geoffrey Irving, Paul Christiano, Dario Amodei posted by: KabirKumar To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals ... To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during training which agent behaviors are safe and useful, but this approach can fail if the task is too complicated for a human to directly judge. To help address this concern, we propose training agents via self play on a zero sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 147 A LOW-COST ETHICS SHAPING APPROACH FOR DESIGNING REINFORCEMENT LEARNING AGENTS attributed to: Yueh-Hua Wu, Shou-De Lin posted by: KabirKumar This paper proposes a low-cost, easily realizable strategy to equip a reinforcement learning (RL) agent the ca... This paper proposes a low-cost, easily realizable strategy to equip a reinforcement learning (RL) agent the capability of behaving ethically. Our model allows the designers of RL agents to solely focus on the task to achieve, without having to worry about the implementation of multiple trivial ethical patterns to follow. Based on the assumption that the majority of human behavior, regardless which goals they are achieving, is ethical, our design integrates human policy with the RL policy to achieve the target objective with less chance of violating the ethical code that human beings normally obey. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 148 LEARNING ROBUST REWARDS WITH ADVERSARIAL INVERSE REINFORCEMENT LEARNING attributed to: Justin Fu, Katie Luo, Sergey Levine posted by: KabirKumar Reinforcement learning provides a powerful and general framework for decision making and control, but its appl... Reinforcement learning provides a powerful and general framework for decision making and control, but its application in practice is often hindered by the need for extensive feature and reward engineering. Deep reinforcement learning methods can remove the need for explicit engineering of policy or value features, but still require a manually specified reward function. Inverse reinforcement learning holds the promise of automatic reward acquisition, but has proven exceptionally difficult to apply to large, high-dimensional problems with unknown dynamics... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 149 PRAGMATIC-PEDAGOGIC VALUE ALIGNMENT attributed to: Jaime F. Fisac, Monica A. Gates, Jessica B. Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S. Shankar Sastry, Thomas L. Griffiths, Anca D. Dragan posted by: KabirKumar As intelligent systems gain autonomy and capability, it becomes vital to ensure that their objectives match th... As intelligent systems gain autonomy and capability, it becomes vital to ensure that their objectives match those of their human users; this is known as the value-alignment problem. In robotics, value alignment is key to the design of collaborative robots that can integrate into human workflows, successfully inferring and adapting to their users' objectives as they go. We argue that a meaningful solution to value alignment must combine multi-agent decision theory with rich mathematical models of human cognition, enabling robots to tap into people's natural collaborative capabilities... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 150 LOW IMPACT ARTIFICIAL INTELLIGENCES attributed to: Stuart Armstrong, Benjamin Levinstein posted by: KabirKumar There are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise pow... There are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise powerful. Much work on the AI control problem has been focused on constructing AI goals that are safe even for such AIs. This paper looks at an alternative approach: defining a general concept of `low impact'. The aim is to ensure that a powerful AI which implements low impact will not modify the world extensively, even if it is given a simple or dangerous goal. The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 151 ETHICAL ARTIFICIAL INTELLIGENCE attributed to: Bill Hibbard posted by: KabirKumar This book-length article combines several peer reviewed papers and new material to analyze the issues of ethic... This book-length article combines several peer reviewed papers and new material to analyze the issues of ethical artificial intelligence (AI). The behavior of future AI systems can be described by mathematical equations, which are adapted to analyze possible unintended AI behaviors and ways that AI designs can avoid them. This article makes the case for utility-maximizing agents and for avoiding infinite sets in agent definitions... ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 152 TOWARDS HUMAN-COMPATIBLE XAI: EXPLAINING DATA DIFFERENTIALS WITH CONCEPT INDUCTION OVER BACKGROUND KNOWLEDGE attributed to: Cara Widmer, Md Kamruzzaman Sarker, Srikanth Nadella, Joshua Fiechter, Ion Juvina, Brandon Minnery, Pascal Hitzler, Joshua Schwartz, Michael Raymer posted by: KabirKumar Concept induction, which is based on formal logical reasoning over description logics, has been used in ontolo... Concept induction, which is based on formal logical reasoning over description logics, has been used in ontology engineering in order to create ontology (TBox) axioms from the base data (ABox) graph. In this paper, we show that it can also be used to explain data differentials, for example in the context of Explainable AI (XAI), and we show that it can in fact be done in a way that is meaningful to a human observer. Our approach utilizes a large class hierarchy, curated from the Wikipedia category hierarchy, as background knowledge. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 153 PATH-SPECIFIC OBJECTIVES FOR SAFER AGENT INCENTIVES attributed to: Sebastian Farquhar, Ryan Carey, Tom Everitt posted by: KabirKumar We present a general framework for training safe agents whose naive incentives are unsafe. E.g, manipulative o... We present a general framework for training safe agents whose naive incentives are unsafe. E.g, manipulative or deceptive behavior can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with 'delicate' parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 154 EMPOWERMENT IS (ALMOST) ALL WE NEED attributed to: Jacob Cannell posted by: KabirKumar One recent approach formalizes agents as systems that would adapt their policy if their actions influenced the... One recent approach formalizes agents as systems that would adapt their policy if their actions influenced the world in a different way. Notice the close connection to empowerment, which suggests a related definition that agents are systems which maintain power potential over the future: having action output streams with high channel capacity to future world states. This all suggests that agency is a very general extropic concept and relatively easy to recognize. ...read full abstract close show post : 0 Add : 0 Add Be the first to critique this plan! ▼ strengths and vulnerabilities add vulnerability / strength 155 THE ISITOMETER: A SOLUTION FOR INTRA-HUMAN AND AI/HUMAN ALIGNMENT (AND UBI IN THE PROCESS) attributed to: Mentor of AIO posted by: ISITometer The ISITometer is a platform designed to accomplish the following three moonshot objectives: Achieve a much ... The ISITometer is a platform designed to accomplish the following three moonshot objectives: Achieve a much higher degree of Intra-Humanity Alignment and Sensemaking Enable AI-to-Human Alignment (Not vice versa) Establish a sustainable, ubiquitous Universal Basic Income (UBI) The ISITometer is a polling engine formatted as a highly engaging social game, designed to collect the perspectives of Humans on the nature of Reality. It starts at the highest levels of abstraction, as represented by the ISIT Construct, with simple, obvious questions on which we should be able to achieve unanimous agreement, and expands through fractaling derivative details. The ISIT Construct is a metamodern approach to the fundamental concepts of duality and polarity. Instead of relying on fanciful metaphors like Yin|Yang, Order|Chaos, and God|Devil that have evolved over the centuries in ancient religions and philosophies, the ISIT Construct establishes a new Prime Duality based on the words IS and IT. From this starting point, the ISIT Construct provides a path to map all of Reality (as Humanity sees it) from the highest level of abstraction to as much detail as we choose to explore. ...read full abstract close show post : 2 Add : 3 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 156 PROVABLY SAFE SYSTEMS: THE ONLY PATH TO CONTROLLABLE AGI attributed to: Max Tegmark, Steve Omohundro posted by: Tristram We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by buildi... We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We end with a list of challenge problems whose solution would contribute to this positive outcome and invite readers to join in this work. ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 157 BOXED CENSORED SIMULATION TESTING: A META-PLAN FOR AI SAFETY WHICH SEEKS TO ADDRESS THE 'NO RETRIES' PROBLEM posted by: NathanHelm-Burger This plan suggests that high-capability general AI models should be tested within a secure computing environme... This plan suggests that high-capability general AI models should be tested within a secure computing environment (box) that is censored (no mention of humanity or computers) and highly controlled (auto-compute halts/slowdowns, restrictions on agent behavior) with simulations of alignment-relevant scenarios (e.g. with other general agents that the test subject is to be aligned to). ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 158 DEEP REINFORCEMENT LEARNING FROM HUMAN PREFERENCES attributed to: Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei posted by: KabirKumar For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we ne... For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems... ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 159 AVOIDING WIREHEADING WITH VALUE REINFORCEMENT LEARNING attributed to: Tom Everitt, Marcus Hutter posted by: KabirKumar How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) is a natural appr... How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) is a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent's actions... ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 160 SAFE MODEL-BASED MULTI-AGENT MEAN-FIELD REINFORCEMENT LEARNING attributed to: Matej Jusup, Barna Pásztor, Tadeusz Janik, Kenan Zhang, Francesco Corman, Andreas Krause, Ilija Bogunovic posted by: KabirKumar Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinfor... Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-M^3-UCRL, the first model-based algorithm that attains safe policies even in the case of unknown transition dynamics... ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 161 MODELING AGI SAFETY FRAMEWORKS WITH CAUSAL INFLUENCE DIAGRAMS attributed to: Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg posted by: KabirKumar Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of... Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of the proposed system should be trained and interact with each other. In this paper, we model and compare the most promising AGI safety frameworks using causal influence diagrams. The diagrams show the optimization objective and causal assumptions of the framework. The unified representation permits easy comparison of frameworks and their assumptions. We hope that the diagrams will serve as an accessible and visual introduction to the main AGI safety frameworks. ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 162 LOVE IN A SIMBOX IS ALL YOU NEED attributed to: Jacob Cannell posted by: KabirKumar We can develop self-aligning DL based AGI by improving on the brain's dynamic alignment mechanisms (empathy/al... We can develop self-aligning DL based AGI by improving on the brain's dynamic alignment mechanisms (empathy/altruism/love) via safe test iteration in simulation sandboxes. ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 163 ACTOR-NETWORK THEORY IN PARTICIPATORY DESIGN ON CREATING ETHICAL AND INCLUSIVE AI PROTOTYPES THROUGH STAKEHOLDER ENGAGEMENT attributed to: For Publication in the Upcoming Responsible Tech Iterations Guide, Maira Elahi posted by: (anon) This alignment plan focuses on the integration of stakeholders in participatory design and prototyping/iterati... This alignment plan focuses on the integration of stakeholders in participatory design and prototyping/iterations stages of AI development. Through participatory design influenced by ANT theory, this will ensure AI systems reflect stakeholder values and address concerns. Prototyping and iterations involve developing early versions of AI systems based on stakeholder input and refining them through iterative feedback sessions. This approach promotes inclusion by incorporating diversity, addressing biases, and enhancing system performance. ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 164 RELAXED ADVERSARIAL TRAINING FOR INNER ALIGNMENT attributed to: Evan Hubinger posted by: KabirKumar "This post is part of research I did at OpenAI with mentoring and guidance from Paul Christiano. It also repre... "This post is part of research I did at OpenAI with mentoring and guidance from Paul Christiano. It also represents my current agenda regarding what I believe looks like the most promising approach for addressing inner alignment. " - Evan Hubinger ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 165 THE CASE FOR ALIGNING NARROWLY SUPERHUMAN MODELS attributed to: Ajeya Cotra posted by: KabirKumar An overview and review of the case for aligning narrowly superhuman models. An overview and review of the case for aligning narrowly superhuman models. ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 166 ELICITING LATENT KNOWLEDGE: HOW TO TELL IF YOUR EYES DECEIVE YOU BY PAUL CHRISTIANO, AJEYA COTRA, AND MARK XU posted by: (anon) ELK stands for Eliciting Latent Knowledge. ELK seems to capture a core difficulty in alignment. The short des... ELK stands for Eliciting Latent Knowledge. ELK seems to capture a core difficulty in alignment. The short description of the issue captured by the problem is that we don’t have surefire ways to understand the beliefs of models and systems that we train, and so if we’re ever in a situation where our systems know things that we don’t, we can’t be sure that we can recover that information. ...read full abstract close show post : 0 Add : 1 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 167 SCALABLE AGENT ALIGNMENT VIA REWARD MODELING: A RESEARCH DIRECTION attributed to: Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg posted by: KabirKumar One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable rewa... One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task objective. This gives rise to the agent alignment problem: how do we create agents that behave in accordance with the user's intentions? We outline a high-level research direction to solve the agent alignment problem centered around reward modeling: learning a reward function from interaction with the user and optimizing the learned reward function with reinforcement learning... ...read full abstract close show post : 0 Add : 2 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 168 GATO FRAMEWORK: GLOBAL ALIGNMENT TAXONOMY OMNIBUS FRAMEWORK attributed to: David Shapiro and GATO Team posted by: Diabloto96 The GATO Framework serves as a pioneering, multi-layered, and decentralized blueprint for addressing the cruci... The GATO Framework serves as a pioneering, multi-layered, and decentralized blueprint for addressing the crucial issues of AI alignment and control problem. It is designed to circumvent potential cataclysms and actively construct a future utopia. By embedding axiomatic principles within AI systems and facilitating the formation of independent, globally distributed groups, the framework weaves a cooperative network, empowering each participant to drive towards a beneficial consensus. From model alignment to global consensus, GATO envisions a path where advanced technologies not only avoid harm but actively contribute to an unprecedented era of prosperity, understanding, and reduced suffering. ...read full abstract close show post : 0 Add : 2 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 169 USING CONSENSUS MECHANISMS AS AN APPROACH TO ALIGNMENT attributed to: Prometheus posted by: Prometheus Using Mechanism Design and forms of Technical Governance to approach alignment from a different angle, trying ... Using Mechanism Design and forms of Technical Governance to approach alignment from a different angle, trying to create a stable equilibria that can scale as AI intelligence and proliferation escalates, with safety mechanisms and aligned objectives built-into the greater network. ...read full abstract close show post : 0 Add : 2 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 170 HIGH-LEVEL INTERPRETABILITY posted by: (anon) Very broadly speaking, high-level interpretability involves taking some high-level aspect of AI systems that w... Very broadly speaking, high-level interpretability involves taking some high-level aspect of AI systems that would be really useful to understand the mechanistic properties of within a particular model, and focusing our efforts on understanding it better conceptually to undertake highly targeted interpretability research toward them. ...read full abstract close show post : 0 Add : 2 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 171 A GENERAL LANGUAGE ASSISTANT AS A LABORATORY FOR ALIGNMENT attributed to: Anthropic (Full Author list in Full Plan- click title to view) posted by: KabirKumar Given the broad capabilities of large language models, it should be possible to work towards a general-purpose... Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful, honest, and harmless. As an initial foray in this direction we study simple baseline techniques and evaluations, such as prompting. We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of large models. ... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 3 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 172 OPEN AGENCY ARCHITECTURE attributed to: Davidad posted by: KabirKumar Utilize near-AGIs to build a detailed world simulation, train and formally verify within it that the AI adhere... Utilize near-AGIs to build a detailed world simulation, train and formally verify within it that the AI adheres to coarse preferences and avoids catastrophic outcomes. ...read full abstract close show post : 0 Add : 3 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 173 AI ALIGNMENT METRIC - LIFE (EXTENDED DEFINITION) attributed to: Mars Robertson 🌱 Planetary Council posted by: Mars This has been posted on my blog: https://mirror.xyz/0x315f80C7cAaCBE7Fb1c14E65A634db89A33A9637/ETK6RXnmgeNcALa... This has been posted on my blog: https://mirror.xyz/0x315f80C7cAaCBE7Fb1c14E65A634db89A33A9637/ETK6RXnmgeNcALabcIE3k3-d-NqOHqEj8dU1_0J6cUg ➡️➡️➡️check it out for better formatting⬅️⬅️⬅️ TLDR summary, extended definition of LIFE: 1. LIFE (starting point and then extending the definition) 2. Health, including mental health, longevity, happiness, wellbeing 3. Other living creatures, biosphere, environment, climate change 4. AI safety 5. Mars: backup civilisation is fully aligned with the virtue of LIFE preservation 6. End the Russia-Ukraine war, global peace 7. Artificial LIFE 8. Transhumanism, AI integration 9. Alien LIFE 10. Other undiscovered forms of LIFE ...read full abstract close show post : 0 Add : 4 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength 174 ENABLING ROBOTS TO COMMUNICATE THEIR OBJECTIVES attributed to: Sandy H. Huang, David Held, Pieter Abbeel, Anca D. Dragan posted by: KabirKumar The overarching goal of this work is to efficiently enable end-users to correctly anticipate a robot's behavio... The overarching goal of this work is to efficiently enable end-users to correctly anticipate a robot's behavior in novel situations. Since a robot's behavior is often a direct result of its underlying objective function, our insight is that end-users need to have an accurate mental model of this objective function in order to understand and predict what the robot will do. While people naturally develop such a mental model over time through observing the robot act, this familiarization process may be lengthy... (Full Abstract in Full Plan- click title to view) ...read full abstract close show post : 0 Add : 7 Add ↓ critiques ↓ ▼ strengths and vulnerabilities add vulnerability / strength Hello, welcome to AI-Plans.com This is an open platform for AI alignment plans and a living peer review of their strengths and vulnerabilities. You can browse and search the library of alignment plans for research relevant to problems you care about. If you register an account, you can share feedback and add your own plans. Feedback can be marked as a Strength or a Vulnerability. If you have several separate ideas, it is better to submit them as individual Strengths or Vulnerabilities, since that will allow other users to consider each of your ideas separately. For more information, see the Substack Thank you for being here! ×