In this last post of the zip bombs series, I’m going to tell you about a new method that has emerged in the last month: overlapping bombs. With this type of bombs has come to achieve the highest rate of decompression of all time: from 46 MB to 4.5 Petabytes.
This kind of bomb that has been created by David Fifield has something very different from those of the previous posts: it does not use recursiveness. Instead, it uses file overlapping. This means that after the first decompression it expands completely, so you can’t stop with the security measures taken by antivirus and operating systems with the previous types of bombs.
In the first post we saw how by Zip’s DEFLATE compression method it is not possible to obtain a compression ratio greater than 1:1032. This is why recursion has been used until now. But both the recursive bomb and the quine bomb are offensive if they are decompressed only once, and not recursively. In this case it has been achieved that from a single decompression a quadratic size is reached to the one of the entrance, surpassing the compression rate of DEFLATE.
To understand how this new technique works, you have to understand the structure of Zip. As we can see below, a Zip is formed by the files it contains, each of them composed of a header and the content of the file. Finally, we find the central directory, which contains the headers of these files and a reference to them:
What Fifield got was headers in the central directory all pointing to the same file. That way, even if the zip only contains a file of a few bytes, if many headers are created in the central directory when decompressing the program will consider that there are thousands of files, because all headers point to it:
He has thus achieved a much better ratio than DEFLATE, going from 1:1032 to 1:21277.
Problems from theory to practice
However, this only works in theory, since a problem arises between the file and the central directory: the name of the file header must match that of the central directory. If in the central directory it says “Header 1”, “Header 2”, …, “Header N“; and there is only the file with Header 1, an error will occur when trying to decompress the others. It is also not possible to always put the header “Header 1” in the central directory, as the system does not allow names to be repeated.
Since most of the weight of a zip is in the data (in this case, in “Data 1”) there is no problem in adding as many headers as required, because a header of a file only occupies 31 bytes. For this reason the next thing that was tested is to introduce many headers and a single content with the data.
The problem with this is that when you unzip the first file, where the program expects to find data finds the header of the next file, which again produces an error.
To fix this problem, what was done was to find a way to “escape” the next header so that the program interprets it as data, but it continues being a header so that the next reference of the central directory continues working. We can get it thanks to the fact that in Zip there are blocks that are not compressed and must be copied as they are in the decompression, and these blocks are preceded by a header of 5 bytes that indicates the size of the block that has to be copied directly. By means of these headers we can “escape” all the headers until arriving at the content of the file:
Let’s see an example of how it would work with N=3: The program starts consulting the first header of the central directory, “CD Header 1”, which redirects it to “Header 1”. Zip executes what is behind “Header 1”. You will find the header “Escape next 2 headers”, so the program writes in the output as are the headers “Header 2”, “Escape next header” and “Header 3”. Then it finds “Data 1” and decompresses it in the output. Then it goes to the next file, consulting the next header of the central directory, “CD Header 2”, which redirects it to “Header 2”. Zip executes what is behind, and finds the header “Escape next header” so it writes in the output header “Header N” and decompresses “Data 1”. To finish the program goes to the last header of the central directory, “CD Header N”, redirects it to the header “Header N” and decompresses in the output “Data 1”.
In this example there are 3 file headers of 31 bytes, 2 escape headers of 5 bytes, a file with the compressed data and 3 headers of the central directory of 1 byte. If we put a compressed file in “Data 1” of 1000 bytes, we have a total of 1106 bytes. That file will be full of zeros to get the maximum DEFLATE decompression rate, so when decompressed it becomes 1.032.000 bytes. When it is repeated 3 times, we have 3,096,000 bytes. Plus the escaped headers, make a total of 3,096,108 bytes from 1,106 bytes.
David Fifield has managed to optimize this to the point of achieving a quadratic decompression rate, with which a 42kB compressed file gives a 5.5GB file, and a 46MB file gives 4.5 PB.
You can download both the zip bombs and the source code of the program that generates them in David Fifield’s blog. But be careful, because nowadays it works and if you try to decompress it your computer will be blocked. In that blog you also have explained in more detail the whole process of construction and optimization of the zip bomb.
With this input I finish the series of zip bombs, after having shown the recursive bombs, the quine bombs and the superimposed bombs.