In computing, DEFLATE is a lossless data compression algorithm and associated file format that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool. The file format was later specified in RFC 1951.
The original algorithm as designed by Katz was patented as U.S. Patent 5,051,745 and assigned to PKWARE, Inc. As stated in the RFC document, an algorithm producing DEFLATE files is widely thought to be implementable in a manner not covered by patents. This has led to its widespread use, for example in gzip compressed files, PNG image files and the ZIP file format for which Katz originally designed it.
A Deflate stream consists of a series of blocks. Each block is preceded by a 3-bit header:
The stored block option adds minimal overhead, and is used for data that is incompressible.
Most compressible data will end up being encoded using method 10
, the dynamic Huffman encoding, which produces an optimised Huffman tree customised for each block of data individually. Instructions to generate the necessary Huffman tree immediately follow the block header. The static Huffman option is used for short messages, where the fixed saving gained by omitting the tree outweighs the percentage compression loss due to using a non-optimal (thus, not technically Huffman) code.
Compression is achieved through two steps:
Within compressed blocks, if a duplicate series of bytes is spotted (a repeated string), then a back-reference is inserted, linking to the previous location of that identical string instead. An encoded match to an earlier string consists of an 8-bit length (3–258 bytes) and a 15-bit distance (1–32,768 bytes) to the beginning of the duplicate. Relative back-references can be made across any number of blocks, as long as the distance appears within the last 32 KB of uncompressed data decoded (termed the sliding window).