LZ77

From Pin Eight
Jump to: navigation, search
In a nutshell: Compress data by copying previous data forward.

LZ77 is a lossless data compression technique described by Abraham Lempel and Jacob Ziv in a 1977 article in IEEE Transactions on Information Theory. It acts as a generalization of run-length encoding, which keeps track of one previous code unit, such as a byte, and allows the data stream to repeat that code unit for a "run" of N times. LZ77 keeps track of a history of 256 or more code units, and a "run" copies N code units starting from any point D in the history buffer relative to the current position. This allows an RLE-style run (e.g. D = -1, N = 10), but it also allows repeating a sequence of code units (e.g. D = -5, N = 10) or repeating a previously seen string from the history buffer (e.g. D = -35, N = 5).

The original formulation of LZ77 strictly alternated literal code units and runs. Most new codecs based on LZ77 have some indication as to whether runs or literal code units follow. One popular variant, LZSS, uses a control byte to say whether the next eight things in the bitstream will be runs or single literal bytes. Both the Allegro library and the Game Boy Advance BIOS can decode LZSS.

A lot of games especially on the Super NES and Game Boy Advance used codecs based on LZ77. The DEFLATE codec used in .zip and .png formats uses LZ77 combined with Huffman coding of D, N, and byte values.

As with RLE, the code units used with LZ77 and LZ78 can be larger or smaller than 1 byte. High- or true-color images may use code units larger than a byte, and indexed color images may use smaller ones.

The performance of LZ77 depends to some extent on the size of the history buffer. DEFLATE uses a 32 KiB history, and modern LZ77 family codecs such as RAR and LZMA use a much longer buffer. A microcontroller with very little RAM, on the other hand, may have room for only 256 bytes of history, which may hinder compression performance.

External links