‘A force multiplier’, small v large models – hacker roundup

What do the fast-improving capabilities of frontier LLMs mean for the future of Bug Bounty? It’s a question many hunters have been asking since the arrival of Claude Mythos. 🤖

Xclow3n, the ‘Chaos Engineer’, is confident that hunters are as relevant as ever. The hacker tested four AI-assisted approaches for finding vulnerabilities over the course of a week and found “14 confirmed vulns in one target in 20 minutes”, but “also burned time on an approach that found nothing useful”. Most of his findings “fell apart” during exploitation. “AI is fast at coverage, hypothesis generation, and code analysis. It’s bad at impact assessment, validation, and knowing what’s actually exploitable,” he wrote. “Every model inflated findings. The researcher’s judgement was the difference between noise and CVEs every single time. AI doesn’t replace security researchers – it’s a force multiplier, not a replacement.” ⚡

Hunters in the loop

YesWeHack-registered hunter Aituglo echoed these sentiments, arguing that “our expertise still lets us steer AI toward the right lead, to know where to dig.” The hacker envisions augmented hunters finding “complex bugs faster than ever” while “an ocean of noise generated by poorly piloted agents” undermines broader security efforts. Aituglo noted that his fellow hunters were all “getting a Claude Max subscription”. This view chimes with our own findings, published in our 2026 report, that AI tools are used for Bug Bounty by 91% of hunters, with 94% observing tangible benefits: faster bug discovery, more complex vulnerabilities and better pattern recognition across large attack surfaces. That survey was conducted six months ago. Given how fast things are moving, the number is surely now somewhat higher. 📊

According to the estimable LiveOverflow, smaller LLMs are actually more cost-effective for security researchers than large models. The TL;DR of his research, where he used two 0-days in oauth2-proxy to benchmark his Hacktron scanning pipeline and compare different models, reads: “If a large model finds a 0-day with 90% probability, and a small model with 50% probability, but the small model costs 10x less, it is better to use the small model.” 🧠

Despite an expansion of our platform’s capabilities, Bug Bounty remains a cornerstone of our evolving vision. YesWeHack believes crowdsourced security testing will become more, rather than less, valuable in an age of rapidly improving AI. Aituglo is right: highly skilled lateral thinkers who can put LLM outputs into context + AI are far superior to non-experts + AI. In fact, the latter is worse than nothing, as the profusion of slop reports demonstrates. Even the companies building LLMs recognise the value of crowdsourced testing. The biggest AI labs continue to invest heavily in Bug Bounty Programs. ✅

Mythos overhyped?

Of course, it’s also the case that many experts believe Anthropic overhyped the capabilities of Mythos. That includes Davi Ottenheimer, an AI security and post-quantum cryptography expert, who said the 244-page system card for Mythos does not substantiate the “thousands” of zero-days framing, and lacks CVE/CVSS data, severity buckets, disclosure timelines, false-positive rates, independent reproduction, and tooling comparisons. “The Mythos system card tested the model against small-scale enterprise networks with no active defenses and the model succeeded,” he writes. “The same document tested the model against a properly configured sandbox with modern patches and the model failed.” 🔎

AI is not just accelerating the expansion of attack surfaces, but giving hunters vulnerability-rich environments to explore, findings from the The State of Vibe-Coded Security report suggest. Commenting on the report, one Redditor said: “It’s a classic ‘senior-looking code, junior-level security’ trap. We're basically entering an era where the attack surface isn't just bad code, but the complete lack of architectural awareness from builders who can deploy in an afternoon without knowing what a JWT even is.” Similarly, 'signalblur' has expressed concern about “the level of access these agents are being given, and the fact that the attack surface they introduce still isn’t well understood.” He was a little less worried about their offensive capabilities, as the title of his piece implies: ‘Why a Decade of Writing Detection Logic Makes the Mythos Exploit Numbers Less Scary’. Another Redditor described it as “a really well grounded article in a time where every news title is more sensational than the last.” 🧩

A fresh batch of hunter interviews to flag now. On YouTube we have krevetk0, who prioritises societal benefits when choosing scopes and sunshinefactory, who talks about hacking IoT targets among other topics. We also have blog writeups for the aforementioned Aituglo, who credits a background in development for knowing where to find bugs, along with Wlayzz, who talks about the need for hunters to embrace AI. 🤝

It’s a bumper edition for our own technical writeups. Alex Brumen’s previous research was praised by a leading PortSwigger researcher as “outstanding” and “the best thing I've read in months” 🔥 Our researcher enablement specialist recently published a sequel that uncovers how unexpected Python behaviours can be abused to achieve path traversal and even RCE. 🐍 First presented at NahamCon 2025’s winter edition, ‘Python Pitfalls: Turning Developer Mistakes into Vulnerabilities’ dives into path manipulation issues in os.path.join, URL handling quirks in urljoin, class pollution problems in Python object handling, and several other real-world exploitation techniques. Alex also penned a new guide to OS command injection featuring detection tips and exploitation walkthroughs for direct, blind, out-of-band, time-based and second-order techniques, and a deep dive into open source code analysis, exploring advanced code analysis techniques such as taint analysis, CodeQL queries and dynamic validation, demonstrated against a real target. Finally, we recently documented a critical auth bypass in WordPress Azure AD SSO, stemming from missing OIDC id_token validation, covering the root cause, patch, PoC, threat landscape and mitigation steps. 🔐

If you prefer something more interactive, our current monthly CTF challenge is Deadbolt. The challenge, which is open for submissions until 4 June, reads: “A private marketplace has just launched – exclusive to the first 10 customers who received a license key at launch. Members can upload and distribute plugins for any Node.js project. You weren't one of the lucky ten.” For inspiration, check out the winners of YesWeHack merch and the best solution submitted for the previous challenge, Bucket Vault. The solution to Dojo challenge #49, Secret Manager, is explained in our latest Talkie Pwnii video. 🎥

Leaderboard

As for the high achievers on real Bug Bounty Programs, Rabhi occupies top spot as usual on both the overall 2026 leaderboard and the second quarter rankings, but not, so far, for the month of May. Well done to the hunters occupying the silver and bronze medal positions for 2026 overall – Edra and drak3hft7 – for Q2 - m0kr4n3 and Laluka – and for May: Vozek and Elweth. 🥇

We’ll conclude as ever with a roundup of other stellar research we’ve spotted since the last edition:

🔬 Claude Code Found a Linux Vulnerability Hidden for 23 Years - Michael Lynch

🔬 Most bug bounty writeups are recycled. Real bugs are hiding in the specs- sin99xx

🔬 Google API Keys Weren't Secrets. But then Gemini Changed the Rules– Joe Leon

🔬 ClawMutiny: We Audited 1,620 OpenClaw Skills. The Leading Scanner Missed 91%- Oathe.ai

🔬 MAD Bugs: "cat readme.txt" is not safe in iTerm2 – Calif

🔬 Cast Attack: A New Threat Posed by 𝖌𝖌𝖍𝖍𝖔𝖔𝖘𝖘𝖙𝖙 Bits in Java– Black Hat Asia presentation slides from Xinyu Bai, Zhihui Chen and Zongzheng Zheng

🔬 Breaking Pingora: HTTP Request Smuggling & Cache Poisoning in Cloudflare's Reverse Proxy – Xclow3n

🔬 QUIC-er Races: HTTP/3 won’t save you from TOCTOU vulnerabilities – Mohammad Amin Nasiri, Efstratios Chatzoglou & Georgios Kambourakis

🔬 The woes of sanitizing SVGs – Thomas Weber (GarboMuffin)

🔬 Achieving Deterministic Prompt Injection Through Client-Side Feedback Loops – XSSDoctor

🤘 Meet the YesWeHack Team 🤘

📍Bsides Tampa| Florida, US | 16 May

📍 Lux'Hack | Luxembourg | 19 May | featuring ‘AI in the Hackers' Loop’ keynote from our CEO, Guillaume Vassault-Houlière

📍Infosecurity Europe| London, UK | booth F139 | 2-4 June

📍 Genev'Hack| Geneva, Switzerland | 9 June | featuring ‘AI in the Hackers' Loop’ keynote from our CEO, Guillaume Vassault-Houlière

📍‘Bugs & Beers’ with YesWeHack| London, UK | 2 June | informal get-together following Infosecurity Europe, featuring how DNV runs its program with George Medhurst, and how hackers operate in the age of AI with Alex Brumen, aka Brumens

📍Congrès du coTer Numérique | Reims, France | booth 178 | 23-24 June

📍 European Cyber Security Organisation (ECSO) 10-year anniversary event | Brussels, Belgium | 24-25 June

📍 OWASP AppSec | Vienna, Austria | booth S12 | 25-26 June

And that’s a wrap until our next edition in July – happy hunting in the meantime! 👊

Read this monthly roundup of content aimed at ethical hackers even sooner by subscribing to Bug Bounty Bulletin.

Are you a CISO, other security professional or security-conscious dev? Check out our CISO-focused sister newsletter, CrowdSecWisdom – bringing you news, insights and inspiration around offensive security topics like Bug Bounty, vulnerability disclosure and management, pentest management and attack surface protection.

‘AI a force multiplier not a replacement’, small vs large models, the state of vibe-coded security – ethical hacker news roundup

Hunters in the loop

Mythos overhyped?

Leaderboard

🤘 Meet the YesWeHack Team 🤘

Products

Researchers

Resources

Company

Follow us