How to Prevent Loss of Control Today

The DAP Framework: actionable interventions against loss of control.

See full report

In this Chapter, we offer an actionable, straightforward framework to minimize LoC threats materializing today.

Our proposed framework aims to work around existing limitations due to uncertainties surrounding the capabilities and propensities that could lead to LoC. While a framework that builds on capabilities and propensities would be intuitive and ideal, we believe such a framework would not be easily actionable by decision- and policymakers today, or would not be sufficiently comprehensive. Researchers have not yet reached consensus on the precise capabilities and propensities (as well as their building blocks) that could trigger a Bounded or Strict LoC event, nor on how multiple capabilities could cumulate to cause such an event, and past which thresholds (Bengio et al. 2025; Phuong et al. 2025; Somani et al. 2025; Meinke et al. 2025); all of which would be necessary for a functional capability-based approach.

While increasing our understanding of the capabilities and propensities that could lead to LoC is fundamental, we believe that it is important that decision- and policymakers have the tools to address the early version of tomorrow’s AI threats today. For this reason, in this Chapter, we propose a straightforward framework that works around existing bottlenecks on LoC capabilities and propensities by instead focusing on extrinsic contributors to LoC risk.

Specifically, we recommend intervening on an AI system’s:

Deployment context (i.e., the intended use case within a specific deployment environment), by critically assessing the potential for cascading failure across interconnected systems and foregoing certain deployments, especially in what we define as ‘high-stakes’ deployment contexts.
Affordances (i.e., the environmental resources and opportunities for affecting the world available to an AI system) by limiting affordances to those strictly necessary to achieve the intended task.
Permissions (i.e., the set of authorizations an AI system is given to exercise its capabilities through the available affordances), by restricting permissions to those strictly necessary to achieve the intended task.

2.A Capabilities and Propensities: Intuitive but not Actionable Today

Existing approaches to address the risk of LoC have concentrated on capturing and intervening on the appropriate AI capabilities¹ and propensities² that could trigger it. This focus is evident in both the COP and IASR texts, supplementing their LoC definitions (Bengio et al. 2025; EU General-Purpose AI Code of Practice 2025). However, decision- and policymakers seeking to identify and mitigate LoC threats today may find this challenging. There is no clear consensus on three points: (i) the specific capabilities and propensities that are necessary for an AI system to be able to cause LoC; (ii) the sub-capabilities that constitute these capabilities; and (iii) the critical thresholds after which specific capabilities could lead to LoC. In turn, this lack of consensus affects decision- and policymakers’ ability to suggest and implement adequate safeguards. We briefly present each of these three challenges in more depth below, before presenting a different approach that decision- and policymakers could operationalize today.

First, there is an absence of a clear consensus on the specific capabilities and propensities that are necessary for an AI system to be able to cause LoC (Bengio et al. 2025; Somani et al. 2025). Indeed, while we found some broad overlap in capabilities and propensities mentioned in the documents containing the two leading multi-stakeholder developed definitions (EU General-Purpose AI Code of Practice 2025; Bengio et al. 2025)—such as, for instance, deception, autonomous self-replication and adaptation, AI research and development (R&D) and situational awareness (i.e., the AI system understanding its situation and limitations, which COP defines as self-reasoning)—we equally found differences. For example, the context around the definition offered by the IASR lists additional capabilities such as agent capabilities, scheming, persuasion, offensive cyber, theory of mind, and general R&D, whereas the context around the definition offered by the COP includes several items that would be more accurately described as AI system behaviors (such as power-seeking behavior, and resistance to goal modification) (EU General-Purpose AI Code of Practice 2025; Bengio et al. 2025).

Second, in cases where there exists some common agreement on a given capability, there remains an absence of consensus on the sub-capabilities that are contained within the category of that capability. For example, while the capability category of autonomous replication and adaptation (ARA) might be considered as potentially leading to LoC for both aforementioned texts, researchers disagree on whether the ARA sub-capability of ‘self-replication’ is strictly necessary for LoC risks involving ARA to materialize (Clymer et al. 2024; Black et al. 2025). Additionally, the understanding of the sub-capabilities composing hypothetical LoC capabilities is in continuous refinement. For instance, researchers from Google DeepMind initially grouped deceptive alignment with other deception and persuasion risks (Phuong et al. 2024), but later work established it as a distinct research area with its own subcategories (Phuong et al. 2025).

Third, even under the hypothetical assumption that there was an agreement on relevant capabilities and their sub-capabilities, the state of the art does not seem sufficiently refined to establish precise thresholds as to when a specific AI capability will pose a realistic threat of LoC (Koessler et al. 2024; Somani et al. 2025).³ While several frontier AI companies established thresholds for plausibly LoC-related capabilities such as AI R&D or deceptive alignment, these thresholds are not framed in terms of LoC risks and rely on varying proxy measures (OpenAI 2025; Anthropic 2025; Google DeepMind 2025; METR 2025). For instance, Anthropic’s first capability threshold for AI R&D, “AI R&D-4,” is defined as “[t]he ability to fully automate the work of an entry-level, remote-only Researcher at Anthropic” (Anthropic 2025), while Google DeepMind’s “Machine Learning R&D acceleration Level 1” focuses on how much the AI system can accelerate AI development, describing it as: “Has been used to accelerate AI development, resulting in AI progress substantially accelerating from historical rates.” (Google DeepMind 2025). Moreover, due to general uncertainties surrounding AI progress, specifics of capabilities, and concrete pathways to harm for LoC, we expect that it will be difficult to accurately forecast the level at which each individual capability becomes critical for LoC.

Finally, we would like to note that a LoC outcome could be the end result of a combination of capabilities, including those described in the IASR and COP (Bengio et al. 2025; EU General-Purpose AI Code of Practice 2025). From our review, it appears that there is currently no consensus on how to account for this factor, and it is unlikely that individual capability thresholds would be able to reflect this factor, as they do not usually contain assessments of risks posed by compounding capabilities (for instance, capability A at x% + capability B at y%).

The aforementioned considerations present a moving target too underdeveloped to operationalize for decision- and policymakers today. Therefore, we present a simple and complementary approach in the remainder of this Chapter, focusing on actionable interventions that can be introduced today.

2.B Focus on Deployment Environment, Affordances, and Permissions

For decision- and policymakers who wish to implement mechanisms and interventions to safeguard against LoC today, uncertainties around capabilities and propensities, and their causal impact on LoC, pose a significant hurdle. We therefore propose a complementary approach that can offer actionable levers today, while largely sidestepping uncertainties surrounding capabilities and propensities. Specifically, as opposed to focusing on AI systems’ intrinsic factors (i.e., capabilities and propensities), we propose to focus on extrinsic factors that can raise the overall risk of LoC occurring. In doing so, we propose a framework inspired by the research conducted in Chapter 1, focusing on deployment context, affordances, and permissions. We refer to this framework as the DAP framework.

In the following sections, we define deployment contexts, affordances, and permissions, and propose a series of initial, high-level interventions leveraging this simple and actionable framework. Given the non-static nature of AI deployment, we suggest that all interventions enacted through the DAP framework should be reassessed at specified intervals and whenever the risk profile changes.

2.B.1 Deployment Context

First, we focus on the deployment context of an AI system. An AI system’s deployment context refers to the combination of a given AI system’s intended use case (for instance, information gathering and processing) and the specific environment in which the AI system is deployed (for instance, an intelligence agency).

We propose that the combination of a specific environment and a specific use case matters for LoC. For instance, an AI system being deployed in the military (environment) exclusively to scan and transliterate archival documents (use case) presents a different risk profile than an AI system in active use in a military targeting system (use case). This results in these two deployment contexts presenting different risks, despite both being in a military environment, with only the latter being a high-stakes deployment context.

We refer to deployment contexts that may lead to a higher risk of LoC as ‘high-stakes deployment contexts.’ While the chances of a malfunction manifesting are not necessarily higher in high-stakes deployment contexts, a malfunction in such contexts is likely to have more severe consequences than in other deployment contexts. We propose that high-stakes deployment contexts share the following characteristics: (i) there is a reasonable expectation that malfunctions in these contexts (see Section 3.A.) cause severe impacts; and (ii) these contexts have features, such as high complexity and limited time to respond, that make malfunctions more likely to escalate into rapid and unavoidable cascading failures (Perrow 1984).

In order to assist decision- and policymakers, we put forward some proposals for what they might wish to consider as high-stakes deployment contexts. We believe that an initial list of high-stakes deployment contexts can be inferred from the literature review in our Chapter 1, and the relevant graph (see [figure2:the_graph]).⁴ Specifically, in our review of concrete scenarios, we found that there were certain recurring deployment environments that appeared especially high-stakes. For example:

Critical national infrastructure. We found two concrete LoC scenarios evolving within critical infrastructure, ranging from large-scale electricity outages (‘Scenario 1’) to catastrophic failures across critical sectors (‘Scenario 4’) in (Kalra and Boudreaux 2025). This finding aligns with other categorizations of critical environments, as, for example, seen in the USA Patriot Act of 2001 as “systems and assets … so vital to the United States that the incapacity or destruction … would have a debilitating impact on security, national economic security, national public health or safety, or any combination of those matters” (United States Congress 2001; PPD-21 2013).⁵
Military. We found two concrete LoC scenarios relating to AI deployment in a military context, involving AI systems being tasked with military coordination in ‘Scenario 3’ in (Kalra and Boudreaux 2025) and in (Kokotajlo et al. 2025). These findings reflect existing concerns around AI systems becoming instrumental for military decision-making under strategic pressure, which could lead to conflict escalation (The Global Commission on Responsible Artificial Intelligence in the Military Domain 2025; Rivera et al. 2024; Hoffman and Kim 2023), as well as around autonomous weapons systems (U.S. Department of State, Bureau of Arms Control, Verification and Compliance 2023; United Nations General Assembly 2024, 2023), which could pose “serious concerns from humanitarian, legal and ethical perspectives” (International Committee of the Red Cross 2021). Moreover, this finding aligns with the fact that military operations are considered to be inherently risky (Chairman of the Joint Chiefs of Staff 2012).⁶

Moreover, we can infer the high-stakes nature of AI research and deployment from increasing attention being paid to it by frontier AI companies and researchers alike (Bellan 2025; Amodei 2024; Clymer et al. 2025). Specifically:

AI research and development. In light of the economic and strategic incentives to apply future AI systems in this context (Stix et al. 2025) and concerns about AI-accelerated AI R&D triggering a hyperbolic capability explosion that could lead to significant negative outcomes (Davidson 2025; Erdil et al. 2024), we suggest that AI R&D should be considered a high-stakes deployment environment.

Given the aforementioned considerations, we advance some recommendations for decision- and policymakers to action the lever made available by considering deployment contexts and, specifically, high-stakes deployment contexts. In particular, we suggest that decision- and policymakers consider the following guidance:

First, to clearly review the ‘composition’ of the deployment context—i.e., the environment and use case—and clarify whether the deployment context should be considered as high-stakes or not. If the deployment context is not considered high-stakes, the LoC risk may be lower, but not necessarily zero.
Second, in both cases (normal deployment context and high-stakes deployment context) to assess the potential for cascading failures across interconnected systems, including both AI- and non-AI systems, through, for instance, threat modeling and red teaming (Anthropic 2025; Shevlane et al. 2023; Koessler et al. 2024), and to consider whether overall risk can be limited by applying 2.B.2 and 2.B.3.⁷

Next, we focus on the affordances and permissions available to an AI system, depending on its deployment context. All else being equal, the constraining affordances and permissions is likely to have a demonstrable effect on reducing the risk, severity, and persistence of harm from potential LoC.

2.B.2 Affordances

Second, we focus on affordances. Affordances are environmental resources and opportunities for affecting the world that are available to an AI system (Sharkey et al. 2024). They can limit or, conversely, enhance a given AI system’s inherent capabilities (Song et al. 2025; Moor and Ziegler 2025; Grey and Segerie 2025), for example, through enabling access to the internet, allowing it to prompt other AI systems, or allowing it to run code (Shah et al. 2025).

We propose that considering affordances is both actionable today and critical. Bestowing a given AI system with a certain affordance or multiple affordances directly impacts the actions that the AI system can or cannot take (Sharkey et al. 2024). While an AI system’s capabilities have some absolute limit, how close to this capability ceiling the AI system can get in a certain context partially depends on the affordances it is given (Moor and Ziegler 2025). This directly translates into the overall risk profile a given AI system presents for LoC (Stix et al. 2025). For example, giving an AI system access to the technical infrastructure necessary to send and receive emails enables it to exercise its capability to use this type of infrastructure to communicate with other entities via email; giving an AI system access to cloud account creation and credential management enables it to allocate resources to itself or escalate its own privileges.⁸ In both cases, if the AI system does not have access to these affordances, it cannot exercise the associated capabilities. Restrictions around affordances help limit risk from potential LoC in both normal and high-stakes deployment environments.

Given the aforementioned considerations, we propose that affordances present clear intervention points that can be easily assessed and actioned. In particular, we suggest that decision- and policymakers consider the following guidance:⁹

First, for any affordance, to consider whether the affordance is necessary to achieve the intended task.
Second, for any affordance, to consider every action that the affordance could enable, and the negative consequences thereof, and, consequently, limit the affordance as much as feasible to reduce the risks through permissions.
Third, for any affordance, to account for the potential for advanced future AI systems to manipulate insufficiently informed human users into giving the AI system additional affordances.

If an affordance must be given to ensure the AI system can achieve its intended task, then there are still avenues to narrow the AI system’s ability to leverage the entire action space made available by an affordance through limiting its permissions. We therefore consider permissions next (2.B.3).

2.B.3 Permissions

Third, we turn our focus toward permissions. Permissions given to an AI system determine what actions it is authorized to take (Stix et al. 2025; NIST 2025b). In other words, the term ‘permission’ refers to whether an AI system is enabled by its developers to utilize its capabilities through the available affordances. Permissions are closely intertwined with affordances and ought to be considered in tandem to provide either an enabling or a restrictive function. For example, an AI system may have technical access to a social media site through its affordances, for example, to read posts, but may lack the necessary permission to publish posts. Conversely, it may have permission to publish posts, but lack the affordance to access the corresponding website in the first place; in this case, it cannot enact the permission until it also has the affordance.

Permissions can significantly decrease the overall oversight one has over the AI system. For instance, if an AI system has the permission to execute arbitrary code on a machine, it can do so without the overseer’s or user’s knowledge, since the human is practically no longer in the loop.

Given the aforementioned considerations, we propose that permissions present clear intervention points that can be easily assessed and actioned. In particular, we suggest that decision- and policymakers consider the following guidance:

First, for any permission, to restrict it to the minimum necessary for the AI system to complete the task. This guidance is based on the well-established principle of least privilege (NIST 2025a; Google 2025).
Second, for any permission, to weigh the benefits and risks of a human’s reduced oversight against the benefits and risks increased permissions could bring.
Third, for any permission, to account for the potential for highly capable AI systems to manipulate insufficiently informed human users into giving the AI system additional permissions.

2.C Reflections

The DAP framework offers a straightforward approach to reducing LoC risk now and in the near future. By largely sidestepping open questions around LoC-relevant capabilities and propensities, the intervention space we describe through the DAP framework is practical, immediately operationalizable, and stands to minimize overall risk without requiring complex or costly technical solutions or relying on unknowns.

In the longer term, the DAP framework is likely to face two main challenges. First, it may not always be desirable for society to limit deployment contexts, affordances, and permissions in the way we described, since these interventions simultaneously limit what the AI system can actually do, therefore reducing its potential usefulness. Indeed, as AI systems become more capable across complex use cases, there may be growing economic and strategic incentives to deploy them with a wide range of affordances and permissions (including in high-stakes environments) with promises of significant benefits, such as efficiency increases and obtaining or maintaining a strategic advantage (including on the geopolitical level). Second, it is plausible that future significantly more advanced AI systems will be able to meaningfully persuade or compel humans to concede more affordances and permissions (Boudreaux et al. 2025; Somani et al. 2025; Dassanayake et al. 2025) or even independently circumvent the blockers established through the DAP framework in a way we cannot currently foresee.

We cannot say with certainty at what point AI systems will have sufficiently high capabilities to enable LoC, nor which one of those capabilities will be of concern, nor at what threshold. Nonetheless, we can reasonably forecast that, at some point in the future, individuals, nation-states, and, potentially, the world will eventually find themselves in a ‘state of vulnerability’ where LoC is significantly likely to materialize. We explain the pathways to and the implications of finding ourselves in a state of vulnerability next.

A Checklist for DAP Framework Implementation

In this appendix, we outline a high-level checklist to further operationalize the DAP framework presented in Chapter 2. We note that this checklist should not be misconstrued as comprehensive and should be repeated at regular intervals, as well as when the risk profile of the AI system changes.

For the proposed deployment context:

Which deployment context (deployment environment and use case) do you intend to deploy the AI system in?
- Is the deployment context high-stakes?
- Does the deployment context require specific affordances and permissions?
- Will the deployment context become high-stakes by virtue of any of these specific affordances or permissions?
Did you develop an in-depth threat model of all plausible failures, assuming the existence of a catalyst (misalignment or malfunctions) and taking into account affordances and permissions?
Did you assess the potential for cascading failures across interconnected AI systems and non-AI systems?
- Did you consider how cascading failures could be limited through affordances and permissions?
How can the implementation and usage of the AI system in the deployment context be limited to minimize the risk of LoC?
Should the deployment be rejected on account of the remaining risks outweighing the benefits?

For any proposed affordance or permission:

Is this affordance or permission necessary for the intended use-case? (If not, it should not be given)
- Given the AI system capabilities, what (potentially undesirable) behaviors does this specific affordance or permission unlock?
Is the AI system sufficiently capable of manipulating human users into providing it with more affordances and permissions?
- Under the expectation that it is, what safeguards are in place to avoid that the AI system unduly gains access to permissions or affordances that would unlock more undesirable capabilities?
For affordances:
- Is the affordance limited by appropriate permissions?
For permissions:
- Does giving this permission follow the principle of least privilege?
- Do the benefits of increased permissions outweigh the risks of potentially reduced human oversight?

Example

We now provide an example run-through of potential considerations for an AI system’s deployment. These considerations aim to provide an intuitive understanding of the process by which the deployment context, affordances, and permissions can be analyzed in accordance with the DAP framework. Note that these steps do not consider the attack surface the AI system might present, such as the AI system being hijacked by prompt injections sent by attackers via email. These considerations ought not to be misconstrued as exhaustive, and all details will ultimately depend on the specific deployment context to be assessed.

Consider the use-case of an AI system that can manage an email inbox on behalf of the user, including summarizing received emails and responding to emails.

Deployment Context.

First, consider the greater deployment environment this AI system will be located in.

Is the email inbox a personal inbox only, or is it a company email?
Is confidential information that would be damaging to the user or others if leaked, being exchanged via this inbox?
Is this a lower-stakes or higher-stakes deployment context?

For instance, if information leaked from emails coming to and being sent from this inbox would endanger the life or well-being of others, the deployment context is clearly higher-stakes than for an inbox mainly used to subscribe to newsletters and organize public social events.

Is the AI system contained to this specific use case only?
How, if at all, is the AI system integrated with other AI and non-AI systems?
Is there potential for knock-on effects from this AI system malfunctioning that could be detrimental?

Some knock-on effects of an AI system malfunction in a lower-stakes case might be being unsubscribed from newsletters or unknowingly organizing meetings. These cases result in annoyance but lack severity in the grand scheme of things. In our hypothetical scenario, considerations about the deployment context result in the conclusion that the deployment context is not high-stakes and that knock-on effects between interconnected systems are acceptable.

Affordances and Permissions.

In our hypothetical scenario, the proposed affordances encompass:

The infrastructure used to send and receive emails.
Calendar read access.
Internet access.
Access to the users’ file system.

In our hypothetical scenario, the proposed permissions encompass:

Reading emails.
Sending emails.
Downloading email attachments.
Deleting emails.

In applying the DAP framework, one should consider:

Are the affordances strictly necessary for the intended functionality?
Are the permissions strictly necessary for the intended functionality?

Two things stand out that do not seem strictly necessary but potentially carry strong downsides. The first is the affordance to access the internet. The affordance to access the internet is not strictly required to reply to emails on behalf of the user, as the necessary information can be supplied by the user, and emails can be extracted from their contact list.

The second is the permission to delete emails. The permission to delete emails is not strictly necessary, since for the purposes of cleaning up the inbox, archiving the emails is sufficient, while not carrying the risk of the user being unable to recover email correspondence.

Having established which of these affordances and permissions are strictly required for the full suite of intended functionality and which are not, we now consider the potential risks of each remaining affordance and permission and how they might be mitigated by reducing the scope of the affordances and permissions.

Have you assessed each affordance for negative downstream impacts?
Have you assessed each permission for negative downstream impacts?
Consider how specifying limitations for each affordance and/or permission can minimize identified downstream impacts.

In pursuit of these checklist items, one may realize that access to the file system allows the AI system to attach any file in the file system to an email, potentially leaking confidential information from the users’ device. This risk can be reduced by limiting the affordance to a certain folder in the file system that only the user can add files to, or by restricting the “send email” permission to needing user approval before an email is sent, such that the email attachment can be reviewed. Another option is to remove the affordance altogether and require files to be manually attached.

One may also realize that reading emails presents the risk that the AI system has access to the content of all the received emails and could leak this information to others. This could be mitigated by reducing the AI system’s memory, i.e., the AI system only gets the context of the current email thread, not the entire inbox. This reduces its utility, but it might nonetheless be desirable according to the user’s preferences.

One may realize that reading calendar events carries the risk that the AI system could leak information about private appointments to other contacts. This can be mitigated by giving the AI system access to a calendar view that only displays blocked or unblocked time, rather than appointment details.

One may realize that allowing the AI system to download email attachments means the AI system may download malware, infecting the machine. The simplest mitigation is not allowing the download of attachments at all. Alternatively, allowing only the download of certain less risky file types (e.g., pdfs) and restricting the size of the download can reduce the risk.

After doing all of this, one should consider whether it is desirable to allow the AI system to prompt the user for additional permissions, e.g., to search through past emails for additional context. These prompts for additional affordances and permissions might help increase utility for certain users, but they might also reduce safety if users do not clearly consider the implications of accepting requests. Therefore, the risk versus benefit of such prompts should be weighed according to the use case and the intended user base. Absent these prompts, the risk of an AI system manipulating the user into giving additional prompts is likely low.

Finally, the decisions made for this AI system in this deployment context should be reassessed as and when the risk profile changes, for instance, because the AI system used is replaced by a more capable one, or the email inbox is used for more confidential information.

References

Amodei, Dario. 2024. “Machines of Loving Grace: How AI Could Transform the World for the Better.” October. https://www.darioamodei.com/essay/machines-of-loving-grace.

Anthropic. 2025. Anthropic’s Responsible Scaling Policy, Version 2.2. https://www-cdn.anthropic.com/872c653b2d0501d6ab44cf87f43e1dc4853e4d37.pdf.

Bellan, Rebecca. 2025. “Sam Altman Says OpenAI Will Have a ‘Legitimate AI Researcher’ by 2028.” TechCrunch, October 28. https://techcrunch.com/2025/10/28/sam-altman-says-openai-will-have-a-legitimate-ai-researcher-by-2028/.

Bengio, Yoshua, Sören Mindermann, Daniel Privitera, et al. 2025. International AI Safety Report. Research report. Department for Science, Innovation; Technology. https://internationalaisafetyreport.org/publication/international-ai-safety-report-2025.

Black, Sid, Asa Cooper Stickland, Jake Pencharz, et al. 2025. RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents. https://arxiv.org/abs/2504.18565.

Boudreaux, Benjamin, Michael J. D. Vermeer, Kamaria Horton, and Nidhi Kalra. 2025. The Case for AI Loss of Control Response Planning and an Outline to Get Started. RAND Corporation. https://doi.org/10.7249/PEA4232-1.

Chairman of the Joint Chiefs of Staff. 2012. Joint Operations Security. CJCSI 3213.01D. Chairman of the Joint Chiefs of Staff. https://www.jcs.mil/Portals/36/Documents/Library/Instructions/3213_01.pdf?ver=02nB5Vly_xu_if8TykFrHA%3D%3D.

Clymer, Joshua, Isabella Duan, Chris Cundy, et al. 2025. Bare Minimum Mitigations for Autonomous AI Development. https://arxiv.org/abs/2504.15416.

Clymer, Josh, Hjalmar Wijk, and Beth Barnes. 2024. “The Rogue Replication Threat Model.” November 12. https://metr.org/blog/2024-11-12-rogue-replication-threat-model/.

Cybersecurity and Infrastructure Security Agency. 2025. Critical Infrastructure Sectors. Https://www.cisa.gov/topics/critical-infrastructure-security-and-resilience/critical-infrastructure-sectors.

Dassanayake, Rishane, Mario Demetroudi, James Walpole, Lindley Lentati, Jason R. Brown, and Edward James Young. 2025. Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework. https://arxiv.org/abs/2507.12872.

Davidson, Tom. 2025. How Can AI Labs Incorporate Risks from AI Accelerating AI Progress into Their Responsible Scaling Policies? https://www.forethought.org/research/how-can-ai-labs-incorporate-risks-from-ai-accelerating-ai-progress-into.

Department of the Army. 2014. Risk Management. DA Pamphlet 385-30. Department of the Army. https://home.army.mil/parks/3615/4595/0848/Safety_Risk_Management_385-30.pdf.

Erdil, Ege, Tamay Besiroglu, and Anson Ho. 2024. Estimating Idea Production: A Methodological Survey. https://arxiv.org/abs/2405.10494.

EU General-Purpose AI Code of Practice (2025). https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai.

European Parliament and Council. 2022. “Directive (EU) 2022/2555 of the European Parliament and of the Council of 14 December 2022 on Measures for a High Common Level of Cybersecurity Across the Union, Amending Regulation (EU) No 910/2014 and Directive (EU) 2018/1972, and Repealing Directive (EU) 2016/1148.” Official Journal of the European Union. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32022L2555.

Google. 2025. Security Controls for Generative AI Systems. Https://www.saif.google/secure-ai-framework/controls.

Google DeepMind. 2025. Frontier Safety Framework Version 3.0. https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/strengthening-our-frontier-safety-framework/frontier-safety-framework_3.pdf.

Grey, Markov, and Charbel-Raphaël Segerie. 2025. “Evaluation Techniques.” In AI Safety Atlas. French Center for AI Safety (CeSIA). https://ai-safety-atlas.com/chapters/05/03#01.

Hoffman, Wyatt, and Heeu Millie Kim. 2023. Reducing the Risks of Artificial Intelligence for Military Decision Advantage. Center for Security; Emerging Technology. https://doi.org/10.51593/2021CA008.

International Committee of the Red Cross. 2021. ICRC Position on Autonomous Weapon Systems. International Committee of the Red Cross. https://www.icrc.org/sites/default/files/document_new/file_list/icrc_position_on_autonomous_weapon_systems.pdf.

Kalra, Nidhi, and Benjamin Boudreaux. 2025. “Not Just Superintelligence: The Many Risks of Near-Future AGI.” Blog post. Geopolitics of AGI (Substack), July 28. https://geopoliticsagi.substack.com/p/not-just-superintelligence-the-many.

Koessler, Leonie, Jonas Schuett, and Markus Anderljung. 2024. Risk Thresholds for Frontier AI. https://arxiv.org/abs/2406.14713.

Kokotajlo, Daniel, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean. 2025. “AI 2027: A Research-Backed AI Scenario Forecast.” AI Futures Project, April 3. https://ai-2027.com/.

Meinke, Alexander, Bronson Schoen, Jérémy Scheurer, Mikita Balesni, Rusheb Shah, and Marius Hobbhahn. 2025. Frontier Models Are Capable of in-Context Scheming. https://arxiv.org/abs/2412.04984.

METR. 2025. Common Elements of Frontier AI Safety Policies. Https://metr.org/common-elements.pdf.

Moor, Oege de, and Albert Ziegler. 2025. XBOW Unleashes GPT-5’s Hidden Hacking Power, Doubling Performance. XBOW Blog. https://xbow.com/blog/gpt-5.

NIST. 2025a. “Least Privilege.” NIST Computer Security Resource Center (CSRC). https://csrc.nist.gov/glossary/term/least_privilege.

NIST. 2025b. “Permission.” NIST Computer Security Resource Center (CSRC). https://csrc.nist.gov/glossary/term/Permission.

OpenAI. 2025. Preparedness Framework V2. OpenAI. https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf.

Perrow, Charles. 1984. Normal Accidents: Living with High-Risk Technologies. Basic Books.

Phuong, Mary, Matthew Aitchison, Elliot Catt, et al. 2024. Evaluating Frontier Models for Dangerous Capabilities. https://arxiv.org/abs/2403.13793.

Phuong, Mary, Roland S. Zimmermann, Ziyue Wang, et al. 2025. Evaluating Frontier Models for Stealth and Situational Awareness. https://arxiv.org/abs/2505.01420.

Presidential Policy Directive/PPD-21: Critical Infrastructure Security and Resilience, Presidential Policy Directive Nos. PPD-21 (2013). https://www.cisa.gov/sites/default/files/2023-01/ppd-21-critical-infrastructure-and-resilience-508_0.pdf.

Rivera, Juan-Pablo, Gabriel Mukobi, Anka Reuel, Max Lamparth, Chandler Smith, and Jacquelyn Schneider. 2024. “Escalation Risks from Language Models in Military and Diplomatic Decision-Making.” The 2024 ACM Conference on Fairness Accountability and Transparency, June, 836–98. https://doi.org/10.1145/3630106.3658942.

Shah, Rohin, Alex Irpan, Alexander Matt Turner, et al. 2025. An Approach to Technical AGI Safety and Security. https://arxiv.org/abs/2504.01849.

Sharkey, Lee, Clíodhna Ní Ghuidhir, Dan Braun, et al. 2024. A Causal Framework for AI Regulation and Auditing. https://doi.org/10.20944/preprints202401.1424.v1.

Shevlane, Toby, Sebastian Farquhar, Ben Garfinkel, et al. 2023. Model Evaluation for Extreme Risks. https://arxiv.org/abs/2305.15324.

Somani, Elika, Anjay Friedman, Henry Wu, et al. 2025. Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents. RAND Corporation. https://doi.org/10.7249/RRA3847-1.

Song, Yueqi, Frank Xu, Shuyan Zhou, and Graham Neubig. 2025. Beyond Browsing: API-Based Web Agents. https://arxiv.org/abs/2410.16464.

Stix, Charlotte, Matteo Pistillo, Girish Sastry, et al. 2025. AI Behind Closed Doors: A Primer on the Governance of Internal Deployment. https://arxiv.org/abs/2504.12170.

The Global Commission on Responsible Artificial Intelligence in the Military Domain. 2025. Responsible by Design: Strategic Guidance Report on the Risks, Opportunities, and Governance of Artificial Intelligence in the Military Domain. The Global Commission on Responsible Artificial Intelligence in the Military Domain. https://hcss.nl/wp-content/uploads/2025/09/GC-REAIM-Strategic-Guidance-Report-Final-WEB.pdf.

United Nations General Assembly. 2023. Lethal Autonomous Weapons Systems. A/RES/78/241. United Nations. https://docs.un.org/en/A/RES/78/241.

United Nations General Assembly. 2024. Artificial Intelligence in the Military Domain and Its Implications for International Peace and Security. A/RES/79/239. United Nations. https://docs.un.org/en/a/res/79/239.

United States Congress. 2001. 42 U.S. Code § 5195c - Critical Infrastructures Protection. https://www.law.cornell.edu/uscode/text/42/5195c#e.

U.S. Department of State, Bureau of Arms Control, Verification and Compliance. 2023. Political Declaration on Responsible Military Use of Artificial Intelligence and Autonomy. Https://www.state.gov/political-declaration-on-responsible-military-use-of-artificial-intelligence-and-autonomy-2/.

Footnotes

An AI system’s capabilities refers to the behaviors the AI system can perform under ideal conditions. That is, it refers to the abilities the AI system has when it is optimally elicited and sufficiently resourced—which may be very different from the ability the AI system has in realistic conditions (Sharkey et al. 2024).↩︎
An AI system’s propensities refers to the behaviors the AI system tends to display in real-world deployment conditions—for example, under real world prompting and safety mechanisms (Sharkey et al. 2024).↩︎
We note, however, that expert consensus appears to be that current AI capabilities are insufficient to enable LoC threats (Bengio et al. 2025).↩︎
More details can be found in Chapter 1 and in [figure2:the_graph].↩︎
The sectors included in CNI may vary between jurisdictions or legislative purposes. For example, the EU’s cybersecurity NIS-2 directive includes energy, transport, banking, financial market infrastructure, health, drinking water, waste water, digital infrastructure, information and communication technology service management, public administration and space as sectors with “high-criticality” (European Parliament and Council 2022). The U.S. Cybersecurity and Infrastructure Agency does not include some of these, but includes sectors such as dams, defense industrial base and nuclear reactors, materials, and waste (Cybersecurity and Infrastructure Security Agency 2025).↩︎
An additional example pertains to army operations directly which are considered “inherently dangerous” according to the Risk Management Pamphlet of the Department of the Army (Department of the Army 2014).↩︎
We note that deployment in select high-stakes deployment contexts (for instance, direct control of nuclear weapons), where LoC would lead to critical and absolute consequences, should be rejected regardless of the potential benefits and safeguards.↩︎
We note that the type of access an AI system has can be constrained via permissions. That is why we consider all three elements in the DAP framework as holistically relevant to assess: affordances, permissions and deployment context.↩︎
We provide a high-level checklist and an example of how to implement these intervention points in Appendix 3.↩︎