COST, AI frontier models and more: A measured take on the future of security testing

May 4, 2026

Is this the future of pentesting?

Gartner recently released a report titled ‘The Future of Pen Testing is Continuous Offensive Security Testing (Dhivya Poole, Carlos De Sola Caraballo, Mitchell Schneider), along with a second that covers implementation.

The vision it sets out is compelling, and many of the points raised are valid. But is there anything truly new here? And will it lead to real change, or will traditional pentesting continue to thrive?

In this article, we give a brief overview of Gartner’s position, consider the ramifications of Mythos, OpenAI's GPT 5-5 and DeepSeek V4 Pro while sharing YesWeHack’s thoughts on where the conversation on security testing needs to go.

Key takeaways from Gartner’s report

  • Security testing is under pressure. Expanding attack surfaces, faster exploits and the impact of AI are forcing security teams to reevaluate how they approach security testing
  • Take a hybrid approach. In response, security teams should combine human and automated testing across continuous and time-bound engagements to manage vulnerability risk
  • Can’t test what you can’t see. A comprehensive, real-time view of the attack surface is fundamental to security testing. An incomplete picture undermines an otherwise sensible testing program
  • Noise is the enemy. Only vulnerabilities that are verifiably exploitable in your environment are worth acting on. Verified findings must be prioritised based on real business impact
  • The human in the loop. Automation and AI can accelerate testing and increase coverage, but expert judgment remains irreplaceable for depth, context and creativity
  • Aim to maximise efficiency. Security teams should aim to integrate vulnerability workflows, centralise findings and optimise risk-based prioritisation

Traditional pentesting cannot keep pace

Gartner’s view is that periodic manual pentesting is no longer sufficient to keep pace with modern IT environments or the evolving threat landscape. The report highlights rapid cloud deployment, identity changes, API evolution and AI-enabled threats as the main culprits in creating exposure windows that remain open for too long.

Instead, organisations should switch to continuous offensive security testing (COST), which is described as “a trigger-driven, intelligence-led model that activates validation when material risk changes, not when the calendar indicates.”

RELATED Validation: Your path to overcoming alert fatigue in vulnerability management

If you’ve been in cybersecurity for a while, these prescriptions should sound familiar. And while periodic pentesting engagements are generally required for compliance purposes, most organisations no longer rely on them exclusively for security risk management.

But Gartner urges organisations to go further by combining pentesting, red teaming, Bug Bounty and security control validation to create a continuous testing program that quickly identifies, validates, prioritises and resolves security issues. This approach combines human and automated testing across both continuous and time-bound engagements.

Beyond this, the report lists three requirements for security testing programs:

  • Trigger-driven. Teams should define events to auto-initiate testing or validation – eg new deployments, exposed assets and “threat-intel spikes”
  • Intelligence-led. Internal data should be correlated with threat intelligence to determine the risk and urgency of potential exposures
  • Integrated. Testing and triggers should be built into existing operations – eg ITSM, SecOps, DevOps and CI/CD

The stated purpose of all this is to reduce risk and remediation time. So, naturally, the proposed metrics mainly cover testing speed, coverage and risk-based triggering, along with prompt remediation of the highest risk issues.

Gartner’s approach is closely tied to its Continuous Threat Exposure Management (CTEM) model. It aims to use triggers and intelligence (combined with internal data from attack surface management, security information and event management (SIEM) etc) to apply different testing methods as needed to minimise exposure windows and prioritise the highest risk issues.

Moving in the right direction

Gartner’s ‘future of pentesting’ vision is directionally sound. However, YesWeHack has a few reservations that are worth exploring.

1. Will traditional pentesting REALLY be replaced?

The claim that “traditional pentesting is no longer sufficient” isn’t new. Vendors, analysts and other industry players have long claimed the standard model of annual pentesting will be left behind… but so far, it’s been remarkably resilient to the changes occurring around it.

Annual pentests are an audit expectation for most regulatory and compliance frameworks. Organisations can and should use a range of testing and validation measures. But rarely can they get away with excluding those all-important pentests.

And the report’s authors know this. They talk extensively about security, assurance and risk… but only use the word ‘compliance’ once.

A continuous and diverse testing program that incorporates Bug bounty, automated tools and red teaming is undoubtedly superior for security and risk management. Combined with effective program measurement and reporting, it should also be better for assurance (though that depends on who needs assuring).

But will the need for periodic traditional pentesting really go away?

Maybe. We’ll have to wait and see. History suggests not, but things can change.

2. Triggers may not be as automated as you’d like

The concept of trigger-driven testing is appealing. Having an automated way to deploy the appropriate security testing method at precisely the right moments and in the right areas would be a win for risk management.

The trouble is that testing isn’t free. Most security testing methodologies have a financial cost,a human resource requirement, an operationally disruptive element… or a combination. If there were no cost associated with testing, it would be done continuously 100% of the time (and that’s exactly how certain types of testing are delivered).

So, when we talk about trigger-driven testing, we aren’t just talking about risk management; we’re also talking about an investment of security resources.

Sometimes, the ROI for trigger-driven testing will be acceptable – generally, for triggers that can be automated with a high degree of certainty and at low cost. For example, automated testing within the CI/CD pipeline is already common practice, and you might also automate testing on discovery or deployment of new externally facing assets.

But in many cases, automated triggers are not feasible, particularly when the appropriate testing method (for example, Bug Bounty or red teaming) has both financial and operational costs.

It might sound appealing to have automated triggers for human-led testing, but will security leaders really be willing to allow automated tools to materially change their Bug Bounty scope and incentives based on ‘threat intelligence spikes’, with no human input? Not likely.

Still, it’s reasonable to set triggers and thresholds, even if your launch process retains a human element. And there will be opportunities for trigger-driven automated testing, or at the very least, automated prompts to schedule testing windows.

3. Potential for noise

No discussion of vulnerability management is complete without mentioning noise.

When changing or improving security testing methodology, you must consider how it will affect the volume of issues requiring human intervention. Transitioning away from annual pentesting and towards continuous testing will mean more findings.

The question is: are those findings useful?

The value of findings is never neutral. They can be classified as either:

  1. Helpful, because they are valid, represent real-world risk and must be fixed
  2. Harmful, because they are false positives or low-quality issues that waste time

This is true unless there’s a mechanism to identify and remove low-quality (invalid, not reproducible, very low risk, etc.) issues before they reach security teams. In other words, it’s fine for security testing to uncover low-quality findings, so long as they’re accurately identified and removed from workflows without in-house human intervention.

What does this mean for continuous pentesting?

Most security teams are already dealing with more findings than they can manage. Any proposal to improve program maturity MUST include a way to identify and prioritise real, exploitable, high-risk issues… or the game is up.

But wait, isn’t AI changing everything?

Anthropic’s Claude Mythos Preview has reportedly found thousands of zero-day vulnerabilities but it is not the only frontier model showing cyber capability: the UK AI Security Institute says OpenAI’s GPT-5.5 reached a similar level of performance on its cyber evaluations. OpenAI is also reportedly preparing a more cyber-focused GPT-5.5-Cyber rollout for trusted defenders. That matters because it suggests this is not just a Mythos-specific moment but part of a broader shift in model capability that security teams need to account for.

Whether or not this is true is hotly debated, with detractors questioning everything from the raw numbers to the (lack of) validation. Regardless, it’s clear that AI will play a role in both offensive security and cyber-attacks in the coming years.

The big question is: will these models benefit attackers more than defenders? Right now, it looks as though the answer is yes.

In recent years, the time-to-exploit (TTE) vulnerabilities has plummeted, with some now exploited in the wild within 48 hours of public disclosure. Estimates vary on how long a typical organisation takes to patch critical vulnerabilities… but it’s fair to say that in most cases it’s considerably more than two days.

Frontier AI models, including Mythos, GPT-5.5 and DeepSeek V4 Pro will continue to force TTE downward, because they provide attackers with the means to automate discovery and mass exploitation.

However, these models don’t address defenders’ core problem. Why? Because the problem was never “finding lots of vulnerabilities”. The problem is finding exploitable vulnerabilities and accurately prioritising them for prompt remediation based on real-world risk.

And collapsing TTE isn’t the only AI-fuelled challenge for defenders:

  • AI‑assisted code ships faster, but without proper human oversight it may also ship a greater volume of vulnerabilities
  • This is fuelling an explosion of CVEs that is exacerbating vulnerability backlogs
  • AI systems bring new classes of vulnerabilities (eg prompt injection, data leakage, model abuse, agent privilege escalation) that require specialised skills to find and verify

AI’s impact on the scalability of vulnerability discovery is so great that NIST has been forced to implement a new risk-based model for its National Vulnerability Database (NVD). Starting this month, CVEs will be prioritised for enrichment based on appearing in CISA’s Known Exploited Vulnerabilities (KEV) Catalog, affecting software used by the US federal government or affecting software defined as critical by EO 14028.

So what do today's frontier AI models mean for defenders?

Certainly, there are opportunities to use AI tools for vulnerability discovery. However, according to a report by the Cloud Security Alliance, SANS and others, one of the most important steps is to increase focus on the basics: segmentation, identity and access, general IT/security hygiene… and vulnerability management.

Specifically, defenders should:

  1. Streamline and automate vulnerability management processes as far as possible
  2. Prioritise exploitability over pure volume

Hang on, isn’t that already what they needed to do, even before all this AI stuff?

Yes. AI dominates the conversation right now, but its main impact on vulnerability management has been to accelerate longstanding problem trends: overwhelm, noise and the need for smarter prioritisation.

This is why many organisations are accelerating improvements to their offensive security programs. They don’t need a bigger list of vulnerabilities to patch. They need more speed, more validation, and more precision in identifying and resolving the most dangerous vulnerabilities first.

To that end, organisations need both automated and human-led security testing, along with some of the other capabilities we’ve discussed in this article.

Our approach to continuous testing

Again, Gartner’s vision is directionally strong. Most experts in the offensive security and vulnerability management space have come to similar conclusions. So long as our reservations are addressed, it’s a good starting point for an effective security testing program.

At YesWeHack, our platform and solutions are aligned with Gartner’s vision in several important ways:

1. We combine human and automated testing across continuous and time-bound engagements

This approach is ideal for minimising exposure windows across the board and providing assurance at specific points in time when risk is highest – eg after a major update, infrastructure change or product release.

2. We prioritise issue validation to eliminate noise

Every one of our solutions includes a strict validation process that eliminates low-quality issues. For Bug Bounty, our triage team verifies every report, removing false positives and ensuring your team only receives real, exploitable issues.

For Continuous Pentesting, our expert human pentesters reproduce and validate every finding before they reach your team.

MORE ON THIS SOLUTION Continuous Pentesting with zero false positives: a fully managed, platform-driven approach

Even our fully automated solution, Autonomous Pentesting, prioritises validation. Automated Checkpoints find and validate issues with full attack scenarios, so you’ll only be alerted to genuine, exploitable issues in your attack surface.

MORE ON THIS SOLUTION Introducing Autonomous Pentest: identify actively exploited vulnerabilities across your attack surface

3. Our risk scoring is based on vulnerability intelligence and business context

Even with false positives removed, security teams are swamped with findings. This is why prioritisation is crucial, and why threat intelligence is an important component.

All findings from our solutions are heavily contextualised and prioritised based on exploitability and real-world risk. We use CVSS, EPSS, asset value, KEV and our own vulnerability intelligence to calculate real-world risk scores for every finding.

That means your team can focus on the highest risk issues without worrying that a genuinely dangerous issue might get missed.

4. We support trigger-based testing and orchestration

Autonomous Pentest and Continuous Pentest both combine attack surface scanning with always-on testing. This means testing is triggered automatically when a new asset is added or discovered in your attack surface.

For more involved triggers, our platform provides the tools and workspaces needed to orchestrate all your security testing. For example:

  • Adjusting program scopes and incentives in your Bug Bounty Program
  • Configuring asset coverage and attack scenarios for Autonomous Pentesting
  • Orchestrating pentesting across all your providers using Pentest Management

All of this can be updated at any time in response to changes in your priorities, attack surface, risk profile or IT/business operations.

5. Our platform integrates seamlessly with your tools and workflows

Testing must be built into security and IT workflows. Organisations have struggled with this for years: the need for security to be part of the functions it serves, not “bolted on” at the end.

That’s exactly what our offensive security and exposure management platform was built to support. The YesWeHack platform provides:

  • Full integrations with existing tools and workflows
  • Dashboards and customisable reporting for audit and compliance proofs
  • A unified vulnerability management workspace that combines findings from ALL testing sources

MORE ON OUR VISION Map, test, fix, comply: unveiling our unified approach to offensive security

See the YesWeHack platform in action

If you’re looking to expand or improve your security testing program, YesWeHack can help.

YesWeHack provides a full range of automated and human-led testing capabilities that can be combined and customised to fit your security and compliance needs.

Contact YesWeHack for a no-obligation live demo and review of your testing needs.