Executive Summary

Key findings, framework overview, and main recommendations at a glance.

See full report

Despite the rising global attention, decision- and policymakers are still operating in the absence of an actionable definition of LoC. This could, in turn, both encourage ‘crying wolf’ situations for scenarios that fall short of LoC, and prevent stakeholders from accurately forecasting and assessing early warning signs of LoC. In fact, existing LoC definitions vary on a wide range of parameters, resulting in even the two most consensus-based definitions—the definitions in the International AI Safety Report and the European Union AI Act’s Code of Practice for General-Purpose AI Models—to differ in both the spectrum of LoC outcomes covered, and the expected timelines for these outcomes.

Scatter plot positioning 12 loss-of-control scenarios by severity and persistence — This graph plots 12 concrete LoC scenarios identified in the literature. We utilized economic impact as a proxy for severity and persistence, which are both mapped on arbitrary axes of 0-100. These data points inform our proposed three-part taxonomy, splitting apart Deviation, Bounded LoC, and Strict LoC. By sharpening the conceptual boundaries of LoC, this taxonomy helps decision-makers to understand the different degrees of LoC and hence better prioritize between risk-reduction strategies. See Section 1.A.2 for more detail on the methodology behind and limitations of this visualization.

This research report aims to address existing divergences and make current LoC definitions and conceptualizations action-ready by demonstrating that it is possible to conceptualize multiple ‘degrees’ of LoC. We present these degrees through a novel taxonomy of LoC based on an extensive literature review and methodology that allows us to place concrete LoC scenarios along two axes: severity (i.e., how many people are affected by a LoC event and to what degree they are affected), and persistence (i.e., how difficult it is to interrupt the ‘harm trajectory’ of a LoC event). Our resulting taxonomy helps decision- and policymakers visualize the lower and upper boundaries of the LoC spectrum more clearly, and better conceptualize the near- and longer-term threats that LoC could pose to national security and humanity. Specifically, drawing from our literature review, our taxonomy distinguishes between the following three degrees:

Deviation: captures events that cause some harm or inconvenience but lack the requisite severity and persistence to reach the economic consequences threshold that the the U.S.’s Department of Homeland Security, Intelligence Community, and other components use to demarcate national-level events in the Strategic National Risk Assessment.
Bounded LoC: captures events that cause great damage or suffering, and are difficult but not impossible to contain, albeit potentially at great cost. Bounded LOC captures threats or hazards that could have the potential to significantly impact the U.S. homeland security.
Strict LoC: captures events that are maximally severe and permanent, capturing events that result in humanity as a whole becoming extinct.

We subsequently put forward a ‘playbook’ that can help decision- and policy-makers build straightforward and actionable preparedness for LoC today and in the near future. Given existing limitations in the current understanding of the role of capabilities and propensities (the ‘intrinsic’ factors) in contributing to LoC, we instead focus our attention on the LoC dynamics enabled by ‘extrinsic’ factors. Specifically, we focus on an AI system’s:¹ (i) deployment context, meaning the combination of a given AI system’s intended use case and the specific environment within which an AI system is deployed; (ii) affordances, meaning the environmental resources and opportunities for affecting the world available to an AI system; and (iii) permissions, meaning the set of authorizations an AI system is given to exercise its capabilities through the available affordances. We refer to this focus on leveraging extrinsic factors to manage LoC threats as a ‘DAP framework’. For each component, we suggest the following steps:

Deployment context: (1) reviewing the ‘composition’ of the deployment context (i.e., the environment and use case), and clarifying whether the deployment context should be considered as ‘high-stakes’ (e.g., critical national infrastructure, military, AI research and development) or not; and (2) assessing the potential for cascading failures across interconnected AI- and non-AI systems, including through threat modeling and red teaming.
Affordances: (1) considering whether an affordance is necessary to achieve the intended task; (2) considering every action that an affordance could enable, and the negative consequences thereof, and, consequently, limiting the affordance as much as feasible to reduce the risks through permissions; and (3) accounting for the potential for future, highly advanced AI systems to manipulate insufficiently informed human users into giving the AI system additional affordances.
Permissions: (1) restricting permissions to the minimum necessary for an AI system to complete the task, taking into account the well-established principle of least privilege; (2) weighing the benefits and risks of a human’s reduced oversight against the benefits and risks increased permissions could bring; and (3) accounting for the potential for future, highly advanced AI systems to manipulate insufficiently informed human users into giving the AI system additional permissions.

We then reflect on a set of societal and technical dynamics and how these could affect future resilience to LoC. Specifically, we reflect on the likelihood of continuous AI capability progress, and on the growing economic and strategic pressures and incentives to leverage AI systems in more complex and high-stakes deployment contexts, endowing them with broader affordances and permissions. We propose that unless these dynamics are handled strategically, it is likely that society will eventually encounter a ‘state of vulnerability.’ We use the expression ‘state of vulnerability’ to describe a necessary precondition in which future, highly advanced AI systems have acquired or could independently acquire sufficient access to resources, affordances, permissions, and capabilities to cause LoC once a catalyst materializes. As a LoC catalyst, we envision both: (i) malfunctions that are misalignment; and (ii) malfunctions that are not misalignment (‘pure malfunctions’).

Subsequently, we model several theoretical pathways that a future with a state of vulnerability could take with regard to the threat of LoC. In doing so, we propose that it is highly unlikely society would not eventually encounter a LoC threat. As part of our theoretical model, we suggest that it is unlikely that the core catalyst of LoC (misalignment or pure malfunctions) would be resolved. In fact, we propose that, even if the alignment problem were solved, pure malfunctions might still occasionally occur, since it is difficult to ensure their non-occurrence ex ante, which makes a future where society lives with a state of vulnerability precarious. Finally, we suggest that if humanity reaches a state of vulnerability, preparedness means being able to hold the state of vulnerability in a condition of perennial suspension, including through a combination of:

Governance interventions, such as: (1) concrete threat modeling; (2) policies describing acceptable deployments; and (3) wide-reaching, easy-to-enact emergency response plans.
Technical interventions, such as: (1) rigorous pre-deployment testing suites in accordance with threat models for the deployment context; (2) control measures that constrain an AI system’s effect on the world around the AI system; and (3) stringent human and AI-enabled monitoring.

References

Sharkey, Lee, Clíodhna Ní Ghuidhir, Dan Braun, et al. 2024. A Causal Framework for AI Regulation and Auditing. https://doi.org/10.20944/preprints202401.1424.v1.

Footnotes

We follow the definition of an AI system as described in (Sharkey et al. 2024) capturing AI systems to “include not only the weights and architecture of the AI system, but also include a broader set of system Parameters [...]. These consist of retrieval databases and particular kinds of prompts.”↩︎