By Pino on 2024-04-25 in Retrotainment
How does Retrotainment Games' Full Quiet squeeze a vast open world into an NES cartridge half a megabyte in size? It uses several layers of compression to squash the world as flat as a pancake.
Data compression refers to transformations on repetitive data to reduce the storage size while allowing the original data to be recovered, either exactly or approximately. It allows a large adventure to be stored in a program cartridge of modest capacity. The engine of Full Quiet takes advantage of various forms of repetition to store its world in what by 2020s standards is a tiny storage medium.
Look at the repetition in the following 27-character string of text:
the fat cat sat on the mat
The 3-character sequence at
, including the trailing space, appears four times in the string. We can replace it with a shorter code and store what the code expands to. This definition consists of the abbreviation @
, the expansion at
, and a symbol to mark the end of the expansion |
. It totals 5 characters: @at |
.
Adding the definition and the compressed text produces a 24-byte string:
@at |the f@c@s@on the m@
The sequence the
likewise appears more than once. However, if we were to add another definition #
that expands to the
, that wouldn't save any bytes because we'd still need to store the definition.
@at |#the |#f@c@s@on #m@
Text compression doesn't work well with a single sentence in isolation. It works better with a longer text, where each entry in the dictionary represents something that appears much more often.
Much of the text in Full Quiet is compressed, including radio transmission dialogue, the "subtunnel" text during a dream, occasional dialogue with Pap, and the credits. This uses a codec called DTE, or "digram tree encoding", which has also been called "digram coding", "dual tile encoding", or "byte pair encoding" in various sources. DTE turns the 128 most common sequences of 2 characters into a reference to that pair. Because ASCII uses only code units 0 through 127, this leaves the rest of the byte range (128 through 255) for references. The "tree" part comes in when one or both of each pair of characters is itself a reference to another pair earlier in the dictionary.
The dictionary takes 256 bytes to store 128 pairs. The first 37 pairs might be as follows:
128: "E " 129: "TH" 130: "S " 131: "T " 132: "IN" 133: "ER" 134: "OU" 135: "AN" 136: "D " 137: "ON" 138: "TO" 139: "LL" 140: "RE" 141: " ",129 ; that is, " TH" 142: ". " 143: "Y " 144: "ST" 145: "EN" 146: "Y",134 ; that is, "YOU" 147: 132,"G" ; that is, "ING" 148: "RI" 149: " H" 150: ".." 151: " W" 152: " S" 153: "OW" 154: "I",130 ; that is, "IS " 155: " C" 156: "LA" 157: "OR" 158: "HA" 159: " B" 160: " A" 161: "CH" 162: "IT" 163: 129,128 ; that is, "THE " 164: "'",130 ; that is, "'S "
With this dictionary, the string PULL THE ORANGE RING
(20 bytes) can be encoded as PU<139> <163><157><135>G<128>R<147>
(11 bytes). (The notation <num>
denotes a reference byte.) Replacing each reference with its definition, such as <139>
with LL
, results in PULL <129><128>ORANGE R<132>G
, which in turn expands to PULL THE ORANGE RING
.
This method compresses the 21 transmissions, 17 subtunnels, 2 Pap dialogue scripts, and the credits from 12,704 bytes to 7,822 bytes with the dictionary included, a savings of 38 percent. There exist other, stronger methods of text compression. Many of them rely on building a dictionary in RAM, something a personal computer has quite a bit of but the NES does not. We chose DTE because it is relatively quick to decode with a dictionary in ROM.
Modern computers usually operate in a 24-bit color space. Each dot has 8 bits for red level, 8 bits for green level, and 8 bits for blue level. Full Quiet has a total of 238 different maps, covering 77,418,496 dots. Each map has daytime, evening, and night. If it stored the exact color of each dot, that would total over 232 megabytes per daypart, or over 696 megabytes in total.
The NES PPU can't even generate all those colors. It can generate about 52 different colors. This lets us reduce the colors from 24 bits to 6 bits, or 58 megabytes per daypart, or 174 megabytes in all. For each area, we also store a palette translation table that maps each day color to its corresponding evening and night colors. These tables total 351 bytes, letting us store only the images for the day.
We use no more than 16 of the 64 colors in each map. Each map has a palette that maps indices 0-15 to PPU color codes 00 to 3F, and this can be packed into 10 bytes per map or 2380 bytes in total. Using a palette lets us store 4 bits per pixel instead of 6, and the daypart palette translation operates on this palette, leaving us with roughly 39 megabytes.
The NES was used with televisions of the 1970s and 1980s, which are not quite as strong at showing fine color detail compared to fine brightness detail. We can choose one universal background color (or "backdrop" color) for each map and then four different subpalettes of three colors, where each set applies to a 16 by 16 dot area. This uses only numbers 0 to 3, two bits per pixel, plus the attribute data used to choose a subpalette for each area. This reduces the map data closer to 19 megabytes, at the cost of occasional visible edges between 16 by 16 dot areas where the tool cannot find a close enough set of three colors. These edges are called "attribute clash." Because over 99 percent of NES games use 16 by 16 dot attribute areas, skilled NES artists have to find ways to disguise clash when they cannot avoid it.
To sum up so far: We're storing 77,418,496 pixels across three dayparts in 19,354,624 bytes for pixels, 75,604 bytes for attributes, 2,380 bytes for palettes, and 351 bytes for daypart palette translation, for a total of 19,432,959 bytes. That's slightly more than 1/12 of our original estimate, though still not small enough for the NES.
The NES is designed to work with graphics divided into "tiles" or "characters", each 8 by 8 pixels in size. The term "character" comes from old video terminals that displayed letters and numbers as a grid of fixed-size glyphs, each 8 by 8 pixels.
Say we find 256 different 8 by 8 pixel tiles for each map. Each tile has 64 pixels and, with two bits per pixel, can be defined in 16 bytes. Then we can store the map as a grid of tile references. Instead of 19 megabytes for pixels, now we have 974,848 bytes for tiles, 1,209,664 bytes for tilemaps, and still 75,604 bytes for attributes.
A lot of maps in the game have similar graphics, seeing as they take place in similar-looking parts of the world. There are 80 mapsets, about one per three maps, and all maps in a mapset share the same tiles. If all mapsets used all 256 available tiles, there would be 25,600 tiles. However, most don't, leaving a total of 16,821 tiles, which can be stored in 269,136 bytes.
Like four of Capcom's Mega Man games, Full Quiet uses CHR RAM. This means while loading a map, it can combine tiles from the individual mapset with tiles taken from a pool of common tiles shared by more than one mapset. This lets us combine 9535 shared tiles into a common pool of 3054 tiles, where 1090 have 3 or more colors and 1964 have only 2 colors. Of these 1964 2-color tiles, only 1218 turn out to be fully distinct. The rest are the same as another 2-color tile with the same shape and different color indices, such as 0 and 1 vs. 0 and 2 or 2 and 3. We can store one of these, using only 1 bit per pixel (8 bytes per tile).
Now we're left with 7286 tiles unique to a mapset at 16 bytes each (116,576 bytes), 9535 references to the tile pool at 2 bytes each (19,070 bytes), 1090 2-bit shared tiles at 16 bytes each (17,440 bytes), and 1218 1-bit shared tiles at 8 bytes each (9744 bytes), totaling 162,830 bytes for background tiles. We still have 1,209,664 bytes for tilemaps, and 78,335 bytes for attributes and palette-related data, for a total of 1,450,829 bytes, but we've got one more trick to do with the tiles.
Run-length encoding is a method of compressing data by combining consecutive repeats of a symbol. Consider the following fragment of a catalog:
Black pens, box of twenty ... $2.10 Blue pens, box of twenty ... $2.35 Red pens, box of twenty ... $2.50
It's tedious to rewrite the entire words on each line. It was even more tedious on stone tablets. For this reason, ditto marks were invented prior to 600 BCE.
Black pens, box of twenty ... $2.10 Blue " " " " ... $2.35 Red " " " " ... $2.50
Or say you have a picture of a chinchilla drawn with red, yellow, green, and blue dots. The background is green, the outline is blue, and the fur is mostly yellow with some red spots. Cut one row of 48 dots from the image:
GGGBBYYY YYYYYYYY BGGGBYYY YRRRYYYY YRRRRYBG GGGGGGGG
If a color is identical to the color to the left, you can replace it with a ditto mark.
G""B"Y"" """""""" BG""BY"" "R""Y""" "R"""YBG """"""""
There are several ways to encode which pixels are literal and which are repeats. PB8, the run-length encoding used in Full Quiet, breaks data into packets of 8 bytes. Each packet is compressed to a a 1-byte header with a 1 bit for repeats and 0 for anything else, followed by those byte values that are not repeats.
01101011 11111111 00110011 10110111 10111000 11111111
G B Y BG BY R Y R YBG
The 48 dots have been reduced to 6 packets totaling 19 bytes:
01101011,G,B,Y
11111111
00110011,B,G,B,Y
10110111,R,Y
10111000,R,Y,B,G
11111111
However, this is not quite the format of NES tiles. The tile format is bit-packed, with each byte containing parts of the 8 dots that make up one row. It's as if each 8 by 1 dot sliver of a tile were compared with the sliver immediately above it. In addition, each tile consists of two 8-byte planes: one for bit 0 of all pixels and one PB8 packet for bit 1 of all pixels. This neatly, as many combinations of 2 colors (0 and 1, 2 and 3, 0 and 2, or 1 and 3) compress one plane to near nothing.
This compression turned out to be quite effective on tilesets given its simplicity. Though the reused tiles cannot be compressed without compromising ability to quickly retrieve them from the pool, the 7286 tiles unique to a mapset compress from 116,576 bytes to 81,118 bytes, saving 30.4 percent.
We're done with tiles, I promise. Let's do maps.
Metatiles are made out of tiles, just as tiles are made out of dots. Groups of four tiles, 8 by 8 dots each, are combined into one 16 by 16 dot small block. Groups of four small blocks are combined into one 32 by 32 dot medium block, which is as tall as the protagonist. Attribute data, choosing which part of the palette, is also stored in medium blocks. Groups of four medium blocks are combined into one 64 by 64 dot large block. Unpacking a tilemap is similar to unpacking DTE, except there are always three layers, and each mapset has its own metatile dictionary that applies to its maps.
Just as there are up to 256 tiles per tileset, there are also up to 256 small blocks, up to 256 medium blocks, and up to 256 large blocks. Unlike Sunsoft's Blaster Master, which has the same three-level structure, the art style of Full Quiet aggressively hides the tile grid. This means that we do occasionally have to send a map back to the art department when the compression tool tells us a limit was exceeded.
Say we have a map as big as the Cliffs, which are 2048 by 2048 dots. Storing a raw tilemap for the Cliffs would take 65,536 bytes. If we store 256 small block definitions at 4 bytes each, 188 medium block definitions at 5 bytes each, and 108 large block definitions at 4 bytes each, that totals 2396 bytes. Then the map becomes a grid of 32 by 32 large block references, to be stored in 1024 bytes. The map is compressed to 3420 bytes, or 5.2 percent of its original size.
We almost never use all 256 large blocks in a mapset. This gives us room to store variants of large blocks in the table. For example, a particular mapset can incorporate large blocks with a particular passage closed, half-open, or open. Then the map loader can choose one at runtime, or even switch an entire map between morph states 0, 1, or 2. The tool sorts the large blocks based on whether or not they're part of a morphing set. Large blocks below a particular cutoff value morph, and large blocks from that value to the end of the table don't morph.
Whenever the camera moves, the scrolling routine draws a row or column of graphics to the tilemap as it enters the view. When the screen shakes after a change in morph state, the scrolling routine runs 34 times, once for each visible column of the map. Each drawn row or column usually crosses five large blocks (totaling 32). When it reads each large block index from the map, if it is less than the map's morph cutoff, it adds the current morph state before looking up which medium blocks to use.
The 80 mapsets have 14,657 mt16s (average 183.2) for 58,628 bytes, 8699 mt32s (average 108.7) for 43,495 bytes, and 4650 mt64s (average 58.1) for 18,600 bytes, for a total of 120,723 bytes of metatile definitions. Combine this with 18,565 bytes of top-level map data for a total of 139,288 bytes of tilemap and attribute, saving 89.2 percent. (There are also five pairs of maps that appear completely identical.)
The 81,118 bytes of compressed unique tiles, 19,070 bytes of references to 27,184 bytes of shared tiles total 127,372 bytes. Add 120,723 bytes of metatile definitions, 18,565 bytes of tilemap, and 2731 bytes for palettes, and we're up to 269,391 bytes. This is 51.4 percent of the 524288 byte ROM capacity, an effective 0.0278 bits per pixel, and certainly a far cry from the original 696 megabyte estimate.
Intrepid modders seeking to produce a total conversion may notice that we left out a few minor overheads, such as parallax backgrounds, animated background tiles, and sprite sheets. We need to have some secret sauce.