Judging the Native Security Agents: Security Copilot, Charlotte AI, and Cortex

The market shifted under the AI-SOC startups. For years, external agents from Simbian, Dropzone, and others investigated alerts coming out of Microsoft Defender, CrowdStrike Falcon, and Palo Alto Cortex. Now the platforms ship their own investigation agents, and for their own alerts they win. Microsoft Security Copilot, CrowdStrike Charlotte AI, and Palo Alto Cortex each sit on telemetry and detection logic no outsider can fully see. If your AI-SOC strategy rests on out-investigating a native agent on its home turf, it rests on sand. The durable play is orchestration and, above all, judgment.

Why the built-in agents win their own turf

A native agent has two advantages an external one cannot copy.

First, access to telemetry and detection logic. When Defender raises an alert, Security Copilot can see the full reasoning behind the detection: which signals fired, what the behavioral model weighed, the raw device and identity events an external connector never exposes through its API. Charlotte AI reads Falcon's process trees and sensor telemetry at a depth the CrowdStrike API does not surface to third parties. Cortex's agents sit inside Palo Alto's own data lake. An external agent works from whatever the platform chooses to expose, usually a normalized alert object and a few enrichment endpoints. For that platform's own alerts, the native agent works from the source while the outsider works from a summary.

Second, economics. These agents are becoming free, or close to it:

Microsoft is bundling Security Copilot capabilities into the E5 licensing tier many enterprises already pay for.
CrowdStrike is seeding Charlotte AI adoption with free credits.

When the native agent is both better at its own alerts and effectively free, selling an external agent that does the same investigation is not a sustainable business. The platforms hold structural advantages on both axes that an independent vendor cannot win.

That is the uncomfortable read for anyone whose product is "we investigate Defender alerts better than Microsoft does." The conclusion is not to give up on AI in the SOC. It is to move up a layer. For the full argument on where AI creates durable value in the SOC, see our pillar on AI for the SOC.

Orchestrate, don't replace

No enterprise SOC runs one detection platform. A typical environment has Defender on endpoints and identity, Falcon on a different fleet from an acquisition, Cortex or Google SecOps in the network and cloud path, plus identity signals from Okta and Entra and a dozen more tools feeding the queue. Each of those platforms is shipping its own agent. So the near-future SOC does not have one AI agent. It has several, each expert on its own slice, each blind to the others.

That creates a coordination problem the platforms have no incentive to solve. Microsoft will not orchestrate CrowdStrike's agent. CrowdStrike will not weigh Palo Alto's verdict against its own. Each native agent investigates within its own walls and hands back a verdict scoped to what it can see. A single real incident does not respect those walls:

An AiTM phishing session shows up as an anomalous Entra sign-in (Security Copilot's turf), a SafeLinks click (also Microsoft), and then lateral movement on an endpoint that happens to run Falcon (Charlotte's turf).
A BEC case spans identity, mailbox, and OAuth-grant telemetry that no single platform agent sees end to end.

Someone has to run these agents as a team: route the right question to the right agent, collect the verdicts, and assemble them into one coherent picture. That orchestration layer is vendor-neutral by definition, because its whole job is to sit above agents that will never coordinate themselves. This is where Cade's Tool Intelligence matters: it aggregates the 10-20 products and 200+ tools a SOC runs into one coherent set with AI-optimized descriptions, so the orchestrator routes work to the right agent instead of guessing.

The real gap: judgment

Orchestration gets the agents talking to one upstream coordinator. It does not tell you what to do when they disagree, or when one of them is confidently wrong. That requires judgment, and judgment is the layer the native agents structurally lack.

Judging an agent's verdict means doing four things the agents cannot do for themselves.

Reconcile conflicting verdicts. Charlotte AI flags an endpoint as benign because the process lineage looks clean. Security Copilot flags the same user's session as high-risk because the token appeared without a fresh interactive auth. Both are correct within their own data. Only a layer that sees both verdicts at once can notice that a clean endpoint plus an anomalous token is exactly the signature of a stolen session cookie, not a contradiction to be averaged away. Reading those cross-tool signatures is the core skill in investigating identity-based attacks.

Apply organizational context the agent can't see. A native agent does not know that the "impossible travel" sign-in belongs to an exec who travels every week, that the host the alert touched is a production database you must protect right now, or that this service account always logs in from that ASN because of a batch job nobody documented. That context lives in the heads of your senior analysts and in six months of accumulated team decisions, not in the platform's telemetry. It is the single biggest reason an agent hands the verdict back to a human. We go deep on why this knowledge never makes it into a document, and what to do about it, in tribal knowledge in the SOC.

Weigh confidence honestly. Two agents marking an alert "suspicious" at very different confidence levels, on very different evidence quality, should not carry equal weight. Judging means labeling what was directly Observed versus Correlated across sources versus Assessed by analyst inference, and propagating that uncertainty into the decision instead of collapsing it into a single score.

Decide act-versus-escalate. The final call: contain the device now, revoke the sessions and refresh tokens, or escalate to a human. That decision depends on blast radius and business impact, which is exactly the context the native agents do not hold. A persistent knowledge graph that resolves the same user, host, and session across all of these tools is what makes blast-radius reasoning possible in the first place.

This is the human-intelligence layer. The agents investigate; the judgment of which verdict to trust, weighed against what your team knows, is the work that was never in any document and never will be.

Toward an autonomous SOC

The industry talks about the autonomous SOC as if more capable investigation agents will get us there. They won't, on their own. Today only about 14% of enterprises let AI take independent remediation actions, while 70% keep a human in the loop, and that ratio reflects a real gap, not just caution. The agents can investigate. What is missing is the layer that decides whether to trust a given investigation, reconciles it against the other agents, and weighs it against organizational context before anything acts.

That is the missing piece for full autonomy. Orchestration coordinates the agents; judgment makes their output trustworthy enough to act on without a human reading every verdict. Build that layer, feed it the organizational memory that compounds month over month, and the human-in-the-loop percentage can fall because the loop has been encoded, not because anyone got reckless.

The native agents have won the investigation layer, and that is fine. The next layer up, orchestrating those agents and judging their verdicts with the human intelligence your team holds, is open, vendor-neutral by necessity, and exactly where Cade is built to sit. That is what turns a fleet of competing native agents into a single SOC that can eventually run itself.