A vendor ships a security patch. Your change board schedules rollout for next week. In the old model, that delay was uncomfortable but survivable. In the new model, the patch itself may be enough for an AI-assisted attacker to build a working exploit before your first deployment ring finishes.

TL;DR

  • Anthropic reported that Claude Mythos Preview built working exploits from recent Firefox and Windows patches in hours, under constrained lab conditions.
  • The key risk is not magical zero-day discovery. It is faster weaponization of already disclosed vulnerabilities during the patch gap.
  • Verizon’s 2026 DBIR says vulnerability exploitation is now the top breach entry point, appearing in 31% of breaches.
  • Monthly patch cycles, broad staged rollouts, and CVSS-only prioritization are too slow for internet-facing and high-value systems.
  • Defenders need exploitability-aware patch SLAs, faster emergency lanes, compensating controls, and detection tied to vulnerable assets.

What Anthropic Actually Tested

On June 8, 2026, Anthropic published research measuring how large language models affect N-day exploitation. An N-day is not an unknown bug. It is a vulnerability that has already been disclosed or patched, while some systems remain unpatched.

That distinction matters. N-days live in the patch gap: the period between public fix availability and real-world remediation. Attackers can compare vulnerable and fixed code, inspect changed binaries, review advisory language, and infer the bug the patch was designed to remove. That process is called patch diffing.

Anthropic evaluated two classes of targets:

TargetWhat the model receivedReported result
Firefox SpiderMonkeyPublic patch diff, component name, severity, vulnerable and fixed jsshell buildsMythos Preview produced PoCs for 14 of 18 patches and 8 working code-execution exploits
Windows kernelVulnerable and patched binaries, public symbols, decompiler output, function-level diff, Microsoft advisory textMythos Preview produced PoCs for 18 of 21 local privilege escalation bugs and 8 full SYSTEM exploit chains

The test was not a full intrusion simulation. Anthropic did not claim the model solved target discovery, delivery, persistence, evasion, or post-exploitation. The important finding is narrower and more useful for defenders: the exploit-development step, historically bottlenecked by scarce reverse engineering skill, can be compressed into hours when a strong model has the patch artifacts and a usable harness.

Why This Changes the Patch Gap

Mandiant’s earlier time-to-exploit research already showed the window shrinking. In its 2021-2022 dataset of exploited vulnerabilities, Mandiant found that exploitation was most likely within the first month after a patch, and that 29 N-day vulnerabilities were exploited within that first month.

Anthropic’s result pushes the operational assumption further. The first question is no longer “Will a public exploit appear before our next maintenance window?” The better question is: “Can a capable operator build one before our normal rollout meaningfully reduces exposure?”

Microsoft’s own Windows Autopatch documentation illustrates the tension. In a typical broad-ring example, devices wait seven days before downloading a quality update, with later deadlines and forced restart behavior depending on policy. That is reasonable for user experience and fleet stability. It is not designed around an exploit-development clock measured in hours, not weeks.

This does not mean every patch instantly becomes a mass-exploitation event. Attackers still need a reachable target, a delivery path, reliability, and a reason to care. But for exposed edge systems, browsers, collaboration platforms, identity infrastructure, VPNs, firewalls, and endpoint privilege escalation bugs, defenders should assume the reverse-engineering barrier is falling.

The DBIR Context: Exploitation Is Already Winning

Verizon’s 2026 DBIR adds the breach-level context. Verizon reported that vulnerability exploitation became the top breach entry point for the first time in the DBIR’s 19-year history, appearing in 31% of breaches. Verizon also framed AI-driven speed as a new challenge that pushes defenders back toward basic resilience: reduce attack surface, prioritize better, and patch what matters faster.

That is the uncomfortable part. The industry was already losing the remediation race before frontier-model exploit assistance became broadly normalized.

The practical lesson is not “patch everything instantly.” That is not possible for most enterprises. The lesson is to stop treating all patches as equal tickets in a monthly queue.

What Defenders Should Change

Create an emergency patch lane

Internet-facing and identity-adjacent systems need a separate SLA. A critical VPN, firewall, SSO, browser, EDR, hypervisor, mail gateway, or collaboration platform bug should not wait behind ordinary workstation hygiene tickets.

Use CISA KEV, ENISA’s European Vulnerability Database (EUVD), vendor exploited-in-the-wild statements, public exploit availability, EPSS, asset exposure, and business criticality as inputs. CVSS is useful, but it is not enough.

For European teams, EUVD belongs in the same workflow as KEV. ENISA launched EUVD in May 2025 under the NIS2 Directive to aggregate vulnerability information, mitigation guidance, and exploitation status for ICT products and services. ENISA also says CISA KEV information is automatically transferred into EUVD, which makes it useful as a European situational-awareness layer rather than a replacement for KEV.

Treat patch release as a detection trigger

When a vendor ships a high-risk fix, start hunting before exploitation is confirmed in the wild. Useful questions:

  • Which exposed assets run the affected product and version?
  • Which controls can reduce reachability until the patch lands?
  • Which logs would show attempted exploitation, crashes, unusual child processes, new service creation, or privilege escalation?
  • Which accounts, hosts, and network paths would become reachable if the vulnerable system falls?

For Windows local privilege escalation bugs, detection rarely starts with an external scan. Look for suspicious crash artifacts, abnormal driver interactions, unexpected SYSTEM process creation, unusual service installation, token abuse, and post-exploitation movement from a previously low-privilege context.

For browser and JavaScript engine bugs, focus on browser crash telemetry, endpoint exploit prevention events, suspicious renderer behavior, unusual child processes, and the initial access path that delivered the malicious content.

Use compensating controls deliberately

When patching is delayed, compensate with controls that match the attack path:

RiskTemporary control
Internet-facing applianceRestrict management interfaces, apply vendor mitigations, block known exploit paths, increase logging
Browser RCEForce browser restart, reduce extension risk, isolate high-risk browsing, monitor exploit prevention events
Local privilege escalationLimit local admin paths, harden EDR tamper protection, monitor service and driver creation
Identity or SSO componentReduce external exposure, enforce phishing-resistant MFA, watch token and session anomalies
Legacy OT or medical systemSegment aggressively, restrict protocol paths, add compensating detection near choke points

Compensating controls are not a substitute for patching. They are a way to reduce exposure while the patch is tested, staged, or blocked by uptime constraints.

Recalibrate “exploitation unlikely”

Anthropic reported that Microsoft had rated many of the tested Windows kernel vulnerabilities as “Exploitation Less Likely” or “Exploitation Unlikely,” yet Mythos Preview still produced PoCs for most of that subset and one full privilege escalation chain for a bug rated “Exploitation Unlikely.”

That does not mean vendor exploitability ratings are wrong. It means many ratings were calibrated for human capability, economics, and historical exploit-development difficulty. AI-assisted reverse engineering changes those economics.

Security teams should treat exploitability ratings as one signal, not a veto. If a vulnerability affects a privileged component on important assets, the patch gap still matters.

The New Operating Assumption

The old remediation model assumed scarcity: few people could turn a patch into a reliable exploit quickly. That assumption bought defenders time.

The new model assumes repeatability: capable operators can use models, harnesses, diffing tools, and automation to test many patches in parallel. Most attempts will still fail. Some will work. The cost of trying keeps falling.

For defenders, the answer is not panic. It is triage discipline.

Build an inventory that can answer exposure questions quickly. Give internet-facing and identity-critical patches a faster lane. Use KEV and threat intelligence to prioritize, but do not wait for confirmed exploitation before looking at your own logs. Tie detection to vulnerable assets. Reduce unnecessary exposure before the next advisory drops.

N-day is starting to sound too slow. For the systems attackers care about most, the defender’s real window may already be measured in hours.


Sources