In this article I will describe a technique that can often compress the chroma data of an image to less than 0.5 bits per pixel on average, without visible artifacts.
What is chroma?
A common technique in the field of image and video compression is to convert RGB images to YCrCb before compression. Instead of representing color as red, green and blue channels, we represent them as luminance (Y) and chrominance (Cr and Cb) channels.
One of the main advantages is that Y contains most of the information, while the chroma channels can be represented in a lower resolution without the human eye detecting it (something that was actually used as early as in the 1930’s-1960’s for color television, where the chroma channels occupied less signal bandwidth than the luma channel).
Note: Throughout the article we assume that the luma channel (Y) is available when compressing and reconstructing the chroma data. The luma information would typically be compressed in some other clever manner (e.g. using a frequency transform like the DCT), but we ignore the details of how that is done in this article.
Key observation: Chroma is similar to luma
If you look at the chroma channels for a color image (like the one above), you will notice that the chroma channels are quite similar to the luma channel. For instance objects and edges appear in the same places, and fine patterns in the chroma channels are also present in the luma channel. I.e. we should be able to derive the chroma channels from the luma channel, using some clever mapping function.
Approximating chroma from luma
Finding a good and accurate mapping function that works for the entire image would be very hard, but if we reduce the problem to smaller sub-blocks of the image (e.g. 16×16 pixels), the problem becomes easier.
Here we use a linear approximation of how the chroma value depends on the luma value, such that:
C(Y) = a * Y + b
E.g. in the example below we use a least squares fit for a single 16×16 block (256 values) and get the line a=0.6504, b=124.8751.
Once a and b are known, an efficient way to encode the linear approximation is to use the two values:
- C1 = a * Ymin + b
- C2 = a * Ymax + b
Where Ymin and Ymax are the minimum and maximum values of Y in the block, respectively. That describes the line between the two points (Ymin, C1) and (Ymax, C2). In our example above Ymin = 47, Ymax = 80, C1 = 155.4 and C2 = 176.9 (as can be seen in the graph). Thus we can reconstruct the chroma values with:
C(Y) = (Y – Ymin) * (C2 – C1) / (Ymax – Ymin) + C1
(Note that we can calculate Ymin and Ymax from the luma channel at the time of decompression, so we do not have to store those values in the compressed data stream).
If we represent the values C1 and C2 with eight bits each, we only need to store 16 bits per block per chroma channel, or a total of 32 bits (four bytes) per block for both the Cr and Cb channels.
Variable block sizes
The simple linear approximation only works well under certain conditions, and this method approximates some blocks better than others. We want to use large blocks for optimal compression, but we may need to use small blocks in parts of the image where the chroma channel is poorly approximated using a linear function.
One way to tackle this is to use varying block sizes. We can simply split a large block into four smaller blocks if we find that the approximation error is too large (according to some configured quality threshold). And we can do that recursively in several steps as needed.
We only need to store one bit of extra information per block (or sub-block), i.e.:
- 0 = Do not split block.
- 1 = Split into four sub-blocks.
I ended up using the block sizes 16×16, 8×8, 4×4 and 2×2 (note that for the smallest block size we do not need to store any “split” bit). I also used fewer bits to represent the chroma values in the smallest block size (6 bits instead of 8), since small blocks have a poor compression ratio and are usually only necessary in high frequency parts of the image (edges etc) where accuracy matters less.
If you look closer, note that even in areas where there are distinct chroma details, large blocks can work well thanks to the luma-to-chroma mapping (from here on, images labeled “chroma only” show the combined Cr and Cb channels with Y set to constant 128):
Dealing with block artifacts
One side effect of the linear approximation is that the blocks may become apparent in the decoded image. This is mostly an issue when we allow a large approximation error (i.e. for aggressive compression). For instance gradients and sharp edges may appear to be “blocky”.

One solution to this is to estimate the error around the edges of each block, based on the edges of neighboring blocks, and subtract the interpolated error value from the values in the block. This is easiest explained with a one dimensional example:

It becomes trickier in 2D, and finding a good interpolation function is a matter of tuning. I ended up using linear interpolation in the X and Y directions, and then applied a weighting function that prioritizes the interpolation along the X axis when the pixel is closer to the left/right edges, and the interpolation along the Y axis when the pixel is closer to the top/bottom edges.
The result of the attempted error compensation is that the block edges become less visible, but the downside is that colors may appear smudged (colors tend to “leak”). However, the result is usually more pleasing to the eye as our vision is more sensitive to sharp edges than blurred out chroma.

A nice property of this solution is that it requires no extra information in the compressed data stream. However, since we are just guessing what the edge error is, the algorithm can actually produce some undesired artifacts. For small block sizes and/or high resolution images, this is usually not noticeable unless you zoom in on the artifacts.
Results
Here are some example images at different compression levels. The figures 0.5 bpp etc denote how many bits per pixel are used for chroma information, on average over the whole image (keep in mind that it does not include the luma information).
- Original: The original, uncompressed image. Effectively 16 bpp (8 bits per chroma channel).
- 0.5 bpp (3.1% of the original size): Usually no visual artifacts (although watch the color in the face of the fire breather closely).
- 0.2 bpp (1.3% of the original size): Some colors may be slightly washed out in some areas (see the motorbikes and the tree).
- 0.13 bpp (0.8% of the original size): Maximum compression and maximum quality loss (no blocks are split into smaller blocks). Visual artifacts are noticeable. Colors get smeared and washed out.
Click on the thumbnails to see the full image.
The experimental C++ code that implements this algorithm can be found in my yuvlab git repository.
Future improvements
Encoding
It should be possible to improve the compression ratio by using more fine-tuned encoding techniques. For instance, it turns out that the packed chroma data can be further compressed by about 30-50% by using regular entropy coding tools such as gzip or lz4.
It may be possible to exploit similarities between nearby blocks, e.g. by using delta-prediction (similar to how pixel values are predicted in PNG).
It may also be possible to use non-linear coding of chroma values such that small deltas have more accuracy than large deltas, similar to DPCM encoding. For instance, instead of encoding C1 and C2, we could encode (C1+C2)/2 and (C2–C1)/2, where the latter can be treated as a “delta”.
Naturally, we should also explore different lossless data compression algorithms (Huffman, deflate, …).
Deblocking
A problem with the current deblocking algorithm is that it assumes that differences to neighboring blocks at the block edge are errors, even if they are not. This can lead to smoothing artifacts around sharp chroma changes near block edges (e.g. synthetic color bars that line up with block edges).
One improvement could be to disable deblocking for blocks that have a low approximation error. That would unfortunately require us to send extra information about the blocks (e.g. one extra bit per block that tells the decoder to not apply deblocking).
Another possibility is to send a single per-image value that defines the error threshold that the encoder used. It could be used for weighting the deblocking effort, so that for images that were compressed with a strict error threshold we would use less smoothing, and thus reduce/eliminate the artifacts.
Software
There is currently only an experimental tool that implements this algorithm. It is not intended to be used for actual image compression. For instance a library with a clean API and an example tool that creates compressed images would be more useful.
An interactive Web tool would also be neat.
Disclaimer
The algorithm described in this article was developed from scratch by means of experimentation and trial and error (with no help from AI, not even for coding assistance). I have no idea if this algorithm is novel or not, but it is based on separately well known techniques (such as color space conversions, linear regression, adaptive refinement and interpolation).