Help:Contents/Finding Content/Compression Algorithms
This is a sub-page of Help:Contents/Finding Content.
This article has just been started and needs the article basics added. Help us out and add them. |
More often than not, a game's data can be compressed quite a bit. When dealing with new formats, they often use one of the algorithms listed below.
To do: We need a lot of help expanding upon this. |
Contents
Run Length Encoding-Based
Run Length Encoding (RLE) is a generic compression algorithm that takes in a series of bytes and compresses them based on how many times a particular string of data is repeated. Due to the ease of implementation, this is one of the more common types of compression.
Generic Methods
A typical 8-bit RLE compression method is to continuously read one byte to determine the number of times the next byte is output when decompressed. Another method is to use nibbles to determine this instead, saving space when the number of bytes to output for each piece of data is 16 or less.
PackBits
This is a default compression method used by Mac OS Classic. This can be found in TIFF files. It typically follows as such:
Byte | Action |
---|---|
00- 7F | n + 1 bytes. |
80 | No operation; treated as a header byte. |
81-FF | Read the next byte, and write it to the output 256 - n times. |
Other Types
Used in:
- Game Boy Advance and Nintendo DS BIOS
Dictionary-Based
LZ77 and LZ78
The two oldest dictionary-based compression algorithms. LZ77 and LZ78 can be referred to as LZ1 and LZ2, respectively. Variants of these methods include LZ# (# being the length of the dictionary code entry in bits or referring to the size of the sliding window). These typically replace repeating strings with a small piece of data noting its entry within the dictionary. LZ77 must start at the beginning of the file to decompress and uses a sliding window for the dictionary, while LZ78 can begin anywhere so long as a suitable dictionary is available.
These will typically have fragments of the data, as well as the distance between instances of that data, the length the data goes to, and its associated symbol.
Algorithms using these as their basis include:
- TPACK
- PowerPacker
- NUKE
- MIO0 (Identified by having an MIO0 header)
- Used in early Nintendo 64 games such as Super Mario 64
LZSS
Based on LZ77. Each dictionary entry has a flag noting whether the string is coded or not. It also relies on a sliding window, like in LZ77.
Algorithms using these as their basis include:
- Nintendo LZ10 and LZ11
- Kosinski compression
Used in:
- Game Boy Advance and Nintendo DS BIOS
- Doom and games based on Doom's engine
- Chrono Trigger (SNES)
- Several of Quintet's games, including ActRaiser
LZW
This particular algorithm is used in Unix's own compress program, where it uses a variant called LZC. It's also used in the ubiquitous GIF format. The most common known variation involves fixed-length 12-bit codes, though variants for 9-bit, 15-bit, 16-bit and 18-bit codes have been seen in other programs.
Used in:
- Descent
- Many SCI games
LZFG
A hybrid compression of LZ77 and LZ78. While it does use a sliding window, it only includes completed phrases into its dictionary, making it faster than LZ77.
LZF
Popular for its low memory usage and its speed during compression.
LZO
A block compression algorithm based on LZ77. This particular algorithm focuses on the speed of decompression.
LZMA
Famously used in 7-zip. Versions are also available in Python and PHP.
Entropy
Golomb coding
A variant of this known as Rice coding can also be found.
Shannon-Fano
Not to be confused with Shannon coding.
Arithmetic
This algorithm relies on gathering the number of words within the data, as well as the frequency of them.
Huffman
This is similar to arithmetic coding, but Huffman encodes data based on individual symbols within the data rather than the whole data stream.
Used in:
- Game Boy Advance and Nintendo DS BIOS
- Many SCI games
Hybrid Algorithms
RNC
Rob Northern Compression (RNC) is commonly seen in European-made games, especially those based in the UK. It typically begins with "RNC" in the header. This combines the LZSS and Huffman algorithms. There are two different methods for handling RNC files.
The official compressor/decompressor is Pro-Pack for the Amiga, although other compressors/decompressors have been made for Windows and Linux.
DEFLATE
Famously used in zlib and used for PNGs. This combines the LZ77 and Huffman algorithms.
Miscellaneous
Burrows–Wheeler transform
Used in bzip2. This method re-arranges strings to allow for easier decompression.
Decompress/compression programs
- Basic Compression Library - Supports RLE, LZ77, Rice, Shannon-Fano and Huffman compressions.
- QuickBMS - Universal script based files extractor and reimporter. Supports tons of games and file formats, archives, encryptions, compressions, obfuscations and other algorithms (700 algorithms overall).