If you appreciate the work done within the wiki, please consider supporting The Cutting Room Floor on Patreon. Thanks for all your support!

Help:Contents/Finding Content/Compression Algorithms

From The Cutting Room Floor
Jump to navigation Jump to search

This is a sub-page of Help:Contents/Finding Content.

Cactus 2.0!
This article has just been started and needs the article basics added.
Help us out and add them.

More often than not, a game's data can be compressed quite a bit. When dealing with new formats, they often use one of the algorithms listed below.

Hmmm...
To do:
We need a lot of help expanding upon this.

Run Length Encoding-Based

Run Length Encoding (RLE) is a generic compression algorithm that takes in a series of bytes and compresses them based on how many times a particular string of data is repeated. Due to the ease of implementation, this is one of the more common types of compression.

Generic Methods

A typical 8-bit RLE compression method is to continuously read one byte to determine the number of times the next byte is output when decompressed. Another method is to use nibbles to determine this instead, saving space when the number of bytes to output for each piece of data is 16 or less.

PackBits

This is a default compression method used by Mac OS Classic. This can be found in TIFF files. It typically follows as such:

Byte Action
00- 7F n + 1 bytes.
80 No operation; treated as a header byte.
81-FF Read the next byte, and write it to the output 256 - n times.

Other Types

Used in:

  • Game Boy Advance and Nintendo DS BIOS

Dictionary-Based

LZ77 and LZ78

The two oldest dictionary-based compression algorithms. LZ77 and LZ78 can be referred to as LZ1 and LZ2, respectively. Variants of these methods include LZ# (# being the length of the dictionary code entry in bits or referring to the size of the sliding window). These typically replace repeating strings with a small piece of data noting its entry within the dictionary. LZ77 must start at the beginning of the file to decompress and uses a sliding window for the dictionary, while LZ78 can begin anywhere so long as a suitable dictionary is available.

These will typically have fragments of the data, as well as the distance between instances of that data, the length the data goes to, and its associated symbol.

Algorithms using these as their basis include:

  • TPACK
  • PowerPacker
  • NUKE
  • MIO0 (Identified by having an MIO0 header)

LZSS

Based on LZ77. Each dictionary entry has a flag noting whether the string is coded or not. It also relies on a sliding window, like in LZ77.

Algorithms using these as their basis include:

  • Nintendo LZ10 and LZ11
  • Kosinski compression

Used in:

LZW

This particular algorithm is used in Unix's own compress program, where it uses a variant called LZC. It's also used in the ubiquitous GIF format. The most common known variation involves fixed-length 12-bit codes, though variants for 9-bit, 15-bit, 16-bit and 18-bit codes have been seen in other programs.

Used in:

  • Descent
  • Many SCI games

LZFG

A hybrid compression of LZ77 and LZ78. While it does use a sliding window, it only includes completed phrases into its dictionary, making it faster than LZ77.

LZF

Popular for its low memory usage and its speed during compression.

LZO

A block compression algorithm based on LZ77. This particular algorithm focuses on the speed of decompression.

LZMA

Famously used in 7-zip. Versions are also available in Python and PHP.

Entropy

Golomb coding

A variant of this known as Rice coding can also be found.

Shannon-Fano

Not to be confused with Shannon coding.

Arithmetic

This algorithm relies on gathering the number of words within the data, as well as the frequency of them.

Huffman

This is similar to arithmetic coding, but Huffman encodes data based on individual symbols within the data rather than the whole data stream.

Used in:

  • Game Boy Advance and Nintendo DS BIOS
  • Many SCI games

Hybrid Algorithms

RNC

Rob Northern Compression (RNC) is commonly seen in European-made games, especially those based in the UK. It typically begins with "RNC" in the header. This combines the LZSS and Huffman algorithms. There are two different methods for handling RNC files.

The official compressor/decompressor is Pro-Pack for the Amiga, although other compressors/decompressors have been made for Windows and Linux.

DEFLATE

Famously used in zlib and used for PNGs. This combines the LZ77 and Huffman algorithms.

Miscellaneous

Burrows–Wheeler transform

Used in bzip2. This method re-arranges strings to allow for easier decompression.

Decompress/compression programs

  • Basic Compression Library - Supports RLE, LZ77, Rice, Shannon-Fano and Huffman compressions.
  • QuickBMS - Universal script based files extractor and reimporter. Supports tons of games and file formats, archives, encryptions, compressions, obfuscations and other algorithms (700 algorithms overall).