Blog

AI Data Centers Are Becoming Cyber-Physical Systems, and Security Must Follow

AI is pushing data centers into a new operating regime: denser compute, higher and more volatile power draw, more automation, and tighter coupling between IT operations and facilities control systems (such as building management systems (BMS), data center infrastructure management (DCIM), and electrical power monitoring (EPMS). That coupling changes the risk equation. Many cyber incidents that used to be “IT problems” now have a credible pathway to become cyber-physical events: reduced capacity, power outages, thermal excursions, equipment stress, forced shutdowns, and availability impacts that propagate across dependent services and customer SLAs.

The booklet describes why this shift is accelerating under growth pressure. The International Energy Agency projects that electricity demand from data centers worldwide will more than double by 2030 to around 945 TWh, with AI a key driver and AI-optimized data centers demand projected to more than quadruple.[1]

 

The Grey Space is now the Critical Attack Surface

In data centers, operational technology is not limited to traditional industrial environments. Operational technology includes programmable systems that sense and act on the physical world — and in a data center it spans the grey space that makes the white space possible.[2] The booklet frames this as a multidimensional exposure surface across power, cooling, airflow, monitoring, and physical systems.

Electrical power monitoring and control: electrical power monitoring systems, transformers, switchgear management, UPS and generator management, transfer systems, power distribution, and breaker controls. UPS and transfer switch issues have led to outages.[3]

Cooling and environmental controls: chillers, cooling towers, CRAH/CRAC, pumps, valves, and building management systems. Chiller failures have led to thermal runaway and shutdown. [4][5]

Data Center Infrastructure Management platforms that integrate power and thermal telemetry with operations workflows. Optimizing the balance of heat dissipation, cooling, and power consumption.

Physical security and life safety systems with networked monitoring and control (access control, CCTV, fire interfaces)

It is valuable to review major outages affecting data centers, as accidental events provide a preview of outages that can occur from an OT attack targeting the grey space.

Unintentional cyber incidents can cause the same impacts as malicious incidents.

A malicious incident can appear to be an unintentional incident. A sufficiently skilled actor will appear as a system or equipment reliability issue, prolonging detection. Unintentional cyber incidents can be used as templates for malicious attacks. These systems are engineered for availability & safe operation and often carry operational constraints that complicate conventional scanning, patching, and configuration change. That is why facilities controls require an approach that respects how the environment actually operates — not only how a typical IT environment is secured.

 

The primary cyber-physical threats are operational — because losses are operational

For AI-intensive environments, it is useful to describe threats in terms of how they affect service delivery, equipment integrity, and safe operation. Across modern data centers, the booklet highlights high-impact patterns that recur in real-world operating models.

•    Loss of view or loss of control in facilities systems (DCIM/BMS/EPMS), forcing conservative operations and capacity reduction
•    Unauthorized changes to setpoints, sequences, or control logic, which can cascade into thermal derating, stress, premature equipment failure, and shutdowns
•    Operationally disruptive ransomware or destructive attacks that start in IT but propagate through shared identity, monitoring, backup, or management dependencies
•    Remote access abuse and trusted-relationship compromise through overly privileged or poorly monitored vendor access
•    Data integrity attacks against telemetry and dashboards that drive wrong operator decisions, cause unintended PLC responses, or mask malicious activity
•    Blended cyber and physical operations where physical reconnaissance, tampering, or rogue devices supports cyber objectives (and vice versa)

A key point from the booklet: the worst outcomes are rarely driven by a single vulnerability. They emerge from combinations of exposure conditions that increase both the probability an attacker can reach critical systems and the magnitude of loss once disruption begins.

 

Why AI amplifies exposure in a multidimensional way

AI does not just add more servers. It changes how the facility operates and how tightly coupled the system becomes. The booklet outlines exposure multipliers that data center leaders can recognize in practice:

•    Faster growth expands attack surface and heterogeneity: more vendors, more one-off designs, more integration points
•    Power management becomes more instrumented and automated: volatile AI loads increase reliance on networked power components and remote support
•    Dependency and concentration risk rises: co-located dependencies (power, water, telecom) create ripple effects across the site and potentially the region
•    Temporary generation and transitional infrastructure add interfaces and vendor touchpoints
•    Strong IT security can mask OT gaps when facilities operational technology is treated as vendor-managed by default

The result is not a single “OT risk.” It is a web of cyber, physical, operational, and dependency exposures that interact — and that must be governed together.

 

Download the Data Center Owner Playbook


 

A practical control blueprint: reduce frequency, reduce magnitude, prove it works

Most organizations have more initiatives than they can execute at once. The booklet frames prioritization through frequency and magnitude: some controls reduce the chance a scenario succeeds; others reduce downtime and blast radius when it does. The aim is to be explicit about which lever each control pulls, and to require evidence that controls are operating as intended. This blueprint aligns with common OT security guidance around segmentation, access control, monitoring, and recovery.[2]

The booklet prioritizes controls that repeatedly show high leverage for cyber-physical loss reduction in data centers:

1) Design for separation (zones, conduits, control planes)

Define zones explicitly (Corporate IT, Operations IT, facilities management platforms, power operational technology, cooling operational technology, physical security/life safety, vendor access). Separate zones for site-wide infrastructure, from those supporting a building or only a data hall so administration, access procedures, align with impact level. Create narrow conduits with allowlisted protocols and destinations and deny-by-default rules both ingress and egress. Ensure there is no direct path from corporate user networks to controller-level networks, using demilitarized zone patterns and hardened intermediary access.

2) Make remote access provably safe

Treat remote access as a safety-critical feature: brokered access (no direct VPN to operational technology), strong authentication, just-in-time authorization, least privilege per asset, limited time window, session controls, forensic recording, anomaly detection, and device posture requirements.

3) Protect the engineering function

Harden engineering workstations and administrative consoles, separate engineering tools from general endpoints, implement change control for logic and setpoints with rollback to known-good versions, and maintain offline baselines for controller configurations and firmware.

4) Instrument operational technology for detection and rapid containment

Deploy passive monitoring for relevant protocols, baseline normal traffic, collect logs from access infrastructure and facilities platforms, correlate identity to actions, and define cyber-physical stop conditions and response playbooks.

5) Build recoverability as a first-class requirement

Restoration often fails because restores are untested or unfamiliar and takes longer than expected. Define recovery targets for facilities platforms and critical control components; test restores regularly; practice safe return-to-service procedures; and maintain spares and lead-time plans for high-risk components.

6) Close the build and commissioning security gap

Rotate credentials, remove temporary access, validate segmentation and firewall policy against the intended design, formalize vendor access governance, and require complete handover (asset inventory, firmware versions, backup locations, and clear operational owners).

7) Manage dependency and concentration risk

Map critical dependencies (utility feeds and substations, water, fuel, telecom routes, upstream digital dependencies), IT-OT dependencies (MFA, Active Dir, virtualization, network mgt), validate diversity claims, model node-failure scenarios (including cyber-triggered variants), and coordinate restoration processes with utilities and municipalities.

8) Integrate physical and cyber security

Treat control rooms, switchgear rooms, and fuel infrastructure as high-security zones; unify identity governance across badges, privileged accounts, and vendors; correlate physical entry with cyber sessions; and monitor for reconnaissance and tampering.

 

Why quantification matters: turning exposure into decisions

One of the most important ideas in the booklet is that quantification is not about false precision. It is about making trade-offs explicit. The approach models multiple scenarios as threat actor plus pathway plus target plus operational consequence, then estimates:
•    Loss Event Frequency: how often the threat actors are likely to succeed given pathways and control strength
•    Loss Magnitude: the distribution of loss outcomes such as downtime, incident response expense, equipment damage, contractual penalties, and recovery costs

This makes prioritization operational. Leaders can decide whether segmentation and remote access redesign reduces more cyber-physical risk than accelerated patching, given real constraints in facilities operational technology environments.

 

Secure capacity is a competitive advantage

In the AI era, capacity is valuable only if it is resilient. The booklet makes the case that owners and operators who treat facilities controls as a cyber-physical system — segmented, governed, monitored, and recoverable — can deliver higher availability and faster, safer recovery when incidents occur. The best time to build these controls is during design and commissioning; the second-best time is before the next expansion wave.

 

Download the Data Center Owner Playbook

 

Relevant DeNexus Products

•    DeRISK QVM (Quantified Vulnerability Management):  https://www.denexus.io/products/derisk/derisk-quantified-vulnerability-management
Useful when you need risk-based vulnerability prioritization and remediation planning across environments that include facilities OT and supporting platforms.


•    DeRISK CRQ (Cyber Risk Quantification Management): https://www.denexus.io/products/derisk/cyber-risk-quantification-management 
Useful when you need to quantify cyber-physical risk in financial and operational terms to support prioritization, reporting, and investment decisions.

 

References

•    [1] International Energy Agency (IEA), “AI is set to drive surging electricity demand from data centres while offering the potential to transform how the energy sector works,” April 10, 2025. URL: https://www.iea.org/news/ai-is-set-to-drive-surging-electricity-demand-from-data-centres-while-offering-the-potential-to-transform-how-the-energy-sector-works

•    [2] National Institute of Standards and Technology (NIST), “Guide to Operational Technology (OT) Security (NIST SP 800-82 Rev. 3),” September 2023. URL: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-82r3.pdf


•    [3] Microsoft Azure, “Azure Incident Retrospective: Datacenter power issue, East US (Tracking ID: 2LZ0-3DG)”, Sept 2023. https://www.youtube.com/watch?v=VU0XttRVyOg, https://azure.status.microsoft/en-us/status/history/


•    [4] Microsoft Azure, “Azure Incident Retrospective: Datacenter cooling in Southeast Asia (Tracking ID: VN11-JD8)”, Feb 2023. https://www.youtube.com/watch?v=5RhF2zk40LI, https://azure.status.microsoft/en-us/status/history/


•    [5] Microsoft Azure, “Azure Incident Retrospective: Datacenter cooling, Australia East (Tracking ID: VVTQ-J98)”, August 2023. https://www.youtube.com/watch?v=j7TxtshCoLw, https://azure.status.microsoft/en-us/status/history/

 

Download the full playbook

We have published a practical operator’s playbook that translates these principles into a concrete, lifecycle-oriented blueprint — design, commissioning, operations, incident response, and recovery — plus an evidence checklist and a structured OT Cyber Risk Quantification sprint to prioritize actions.