AI for the SOC: What Works, What Doesn't Yet, and the Path to Autonomy

AI for the SOC is everywhere in slide decks and almost nowhere in production. Vendors demo agents that triage alerts in seconds, yet MIT's NANDA study found only about 5% of custom enterprise AI tools ever reach production. The operational reality AI was supposed to fix has not moved: roughly 40% of SOC alerts are never investigated, and 66% of SOC teams say they cannot keep pace with alert volume. So the question worth asking is not "can AI investigate an alert?" (it demonstrably can) but "why is AI for the SOC stuck in the pilot phase, and what does it take to reach an autonomous SOC?"

The short answer: most agentic SOC tools solve the easy half of the problem and ignore the hard half. They can read telemetry and draft a summary. They cannot supply the judgment, fit the workflow, or out-reason the platform-native agents that now ship for free. Three gaps explain the distance between the demo and production.

The three gaps

Gap 1: Human intelligence

An AI agent can pull the sign-in logs, correlate the IPs, summarize the alert, and recommend a next step. Then it stops and hands the verdict back to a human. That hand-off is not a UX choice. It is structural. The verdict depends on judgment and tribal knowledge that was never written down:

Do we suspend this device, or is the user a field engineer who loses a day of work if we do?
Is this a malicious login, or an exec traveling through a new ASN?
Did the activity touch a production server we have to protect right now, or a sandbox nobody cares about?

None of that lives in the telemetry. It lives in the heads of the senior analysts and in the accumulated context of how this specific environment behaves. The common response is to hand-curate that knowledge into static files. CrowdStrike's AgentWorks demo shows a user literally dragging in an "IOC-review-process" file and an "asset-prioritization" file to feed the agent. That is a losing race. Judgment does not live in documents, and context shifts faster than anyone can write it down. The asset that was sandbox last quarter is production this quarter. The "impossible travel" pattern is normal for the team that just opened a Singapore office. Static knowledge files are stale the day they are saved. This is the tribal knowledge problem, and it is the single biggest reason AI investigations stall one step short of a decision.

Gap 2: Operational fit

SOCs already drown in tool sprawl. A typical team runs 10 to 20 products spanning 200+ tools. The last thing that environment needs is one more console. Yet nearly every AI-SOC vendor ships its agent as a new portal and expects the analyst to come to it. That is backwards. Enterprises have spent years building battle-tested workflows inside a customized control tower. Most commonly ServiceNow, sometimes Palo Alto or Google SecOps, with their escalation paths, approval gates, RBAC, and audit trail already wired in. An agent that forces a new front end asks the team to abandon all of it.

Buyers have been explicit about this: 66% of enterprise buyers rank minimal disruption as their top requirement for an AI tool. An agent that fits the existing workflow gets adopted. An agent that demands a new one gets piloted and quietly shelved. "Supercharge your SOC, without changing it" is not a slogan; it is the only adoption path that survives contact with a real security org.

Gap 3: Agent judgment, and the rise of the native agents

This is the gap that has shifted most in the last year. The first wave of AI-SOC startups built agents to investigate alerts from Microsoft Defender, CrowdStrike Falcon, and Palo Alto Cortex. Then the platforms shipped their own: Microsoft Security Copilot, CrowdStrike Charlotte AI, Palo Alto Cortex. For their own alerts, the native agents win. They have privileged access to the underlying telemetry and the detection logic that fired the alert; an external agent is reverse-engineering from the outside what the native agent reads directly.

Worse for the external-agent business model, the native agents are becoming free. Security Copilot is bundled into Microsoft E5. CrowdStrike is seeding Charlotte AI with free credits. Competing head-on with a native investigator converging on free is not a sustainable position.

The right move is not to build a better Defender investigator than Microsoft. It is to orchestrate these agents and, above all, to judge them: weigh a Charlotte verdict against a Security Copilot verdict, reconcile the conflicts, fold in the asset and identity context the platform agents do not have, and decide when to act. Judging the agents takes exactly the human intelligence the agents lack, which loops straight back to Gap 1. We go deeper on this in judging the native security agents.

Why "replace the analyst" is the wrong frame

A lot of agentic-SOC marketing implies the analyst is about to be automated away. The data on what enterprises actually permit tells a different story. Per Darktrace's 2026 figures, only 14% of enterprises allow AI to take independent remediation actions, while 70% run strictly human-in-the-loop. That is not timidity. In a SOC, a wrong autonomous action, suspending the CFO's account during a board meeting or isolating a production database, is its own incident. The cost of a false positive that takes action is high enough that human approval is rational, not reluctant.

Pair that with the 66% minimal-disruption requirement and the design direction is clear. The job is not to replace the analyst. It is to amplify the analyst: do the legwork, assemble the evidence, surface the recommendation with its reasoning, and leave the decision, and the action, with a human who can be held accountable for it. An assistant that makes a good analyst faster is a product enterprises will deploy. An "autonomous remediation" agent they will not let touch production is a pilot that never converts.

This is also why "the model will get smarter and the problem solves itself" misses the point. Even a perfect reasoner still needs the environment-specific judgment that was never written down, still needs to live inside the existing control tower, and still has to earn the right to act. Better models help with investigation. On their own, they do nothing for the three gaps.

The missing layer

Put the three gaps together and the shape of what is missing becomes obvious. The SOC needs a layer that:

Captures the human intelligence the team already holds (not by asking analysts to write documents, but by learning from how they actually triage, what they treat as normal, and the calls the senior people make) and uses it to judge alerts and verdicts.
Judges and orchestrates the native agents rather than competing with them, reconciling their outputs and deciding when to act.
Lives inside the control tower the team already runs (ServiceNow, Palo Alto, Google SecOps) so it fits the workflow instead of adding a console.

That layer is what makes an autonomous enterprise SOC possible, and it is the layer Cade builds. A few properties make it work in practice rather than in a demo.

Organizational memory is a compounding moat. The longer a team uses the layer, the more valuable it gets. Six months in, it has absorbed how this team triages, what is normal in this environment, and what the senior analysts know, and that accumulated knowledge is non-transferable to a competitor. Switching costs grow every month, not because of lock-in tricks, but because the captured judgment is genuinely yours and specific to you. This is the inverse of the static-file approach: instead of decaying the day it is written, the knowledge compounds.

A persistent knowledge graph resolves entities across tools. Users, hosts, apps, and sessions get reconciled into single identities spanning every product, which is what lets the layer answer "what is the blast radius of this account" instead of returning four disconnected tool views. This matters acutely for identity-based attacks, where a single compromised identity surfaces as fragments across Entra, the EDR, the email gateway, and the IdP, and the whole investigation depends on stitching those fragments back into one entity.

Tool Intelligence tames the sprawl. The 10 to 20 products and 200+ tools a SOC runs get aggregated into one coherent set with AI-optimized descriptions. That directly cuts the tool-selection errors that wreck naive multi-tool AI, where the agent calls the wrong API or the wrong product and quietly returns garbage.

It is built to be trusted with security work. Every recommendation comes with transparent reasoning. Every action is auditable. Access is governed by granular RBAC. And there is no data centralization. Data is queried in real time and left where it lives, rather than copied into yet another store you now have to secure and govern. The analyst stays in control; the layer does not make autonomous decisions on your behalf today.

One more design choice underwrites all of this: the layer is foundation-model-neutral. It is built around frontier models rather than a bespoke reasoning engine that gets stranded on a model from two years ago. As the models improve, the layer improves with them, the same way Harvey rides frontier models for legal work. The durable value is not the model; it is the captured judgment, the graph, and the fit.

The path to autonomy

The autonomous SOC is a destination, not a launch feature, and the honest path to it has stages.

Today: an AI assistant. It investigates, correlates, and recommends with transparent reasoning, inside ServiceNow and the other tools the team already lives in. The human makes every decision and takes every action. This is the stage enterprises will actually deploy, and it is where the organizational memory starts compounding.
Near-term: Task Agents with human oversight. As trust accrues, the layer takes on bounded, well-understood workflows end to end, the eviction steps for a confirmed account compromise, say, under explicit human oversight and approval gates. The 70% running human-in-the-loop today are describing exactly this stage.
Future: orchestrating autonomous agents. As the models mature and the captured judgment deepens, the layer coordinates the native and task agents across the whole SOC, weighing verdicts, resolving conflicts, sequencing actions, and becomes the operating system of the AI-powered SOC.

Each stage depends on the one before it. You cannot orchestrate autonomous agents safely until you have captured the judgment that tells you when their verdicts are wrong. You cannot earn the right to take action until you have spent months proving your recommendations inside the team's real workflow. Autonomy is not a switch you flip; it is trust you accumulate, and the organizational memory is the ledger that trust is written in.

The three gaps, human intelligence, operational fit, and agent judgment, are why AI for the SOC has been stuck in experimentation. They are also a roadmap. Close them in the right order, embedded in the control tower the team already runs, and you build the missing layer the autonomous SOC actually needs: one that captures the judgment, judges the native agents, and earns the right to act, one month of compounding memory at a time. The same three gaps recur beyond the SOC, across vulnerability management, data security, and insider risk, which is why getting this layer right is the foundation, not the feature.