展示 HN：一种亮度依赖色度压缩算法（图像压缩）

展示 HN：一种亮度依赖色度压缩算法（图像压缩）
Show HN: A luma dependent chroma compression algorithm (image compression)

原始链接: https://www.bitsnbites.eu/a-spatial-domain-variable-block-size-luma-dependent-chroma-compression-algorithm/

本文详细介绍了一种高度压缩图像色度（颜色）数据的技术，有可能在不产生明显视觉伪影的情况下，实现每像素低于0.5比特的压缩。该方法利用了色度通道通常与亮度（Y）通道高度相关的观察结果——本质上，颜色信息可以从亮度近似得到。核心思想是在图像块（起始为16x16像素）内，使用基于该块内最小和最大Y值的简单方程线性近似色度值。这只需要存储每个色度通道每个块的两个值，从而大大减小数据大小。使用可变块大小（低至2x2）以提高复杂区域的精度，并使用单个比特指示是否应分割块。为了减轻这种近似造成的“块状”伪影，应用了一种边缘误差估计和补偿技术，以平滑块之间的过渡。结果表明，可以实现低至每像素0.13比特的压缩，但质量损失会增加。进一步的改进可能包括高级熵编码和去块算法的改进。作者提供了一个实验性C++代码链接以供实现。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录展示HN：一种亮度依赖的色度压缩算法（图像压缩）(bitsnbites.eu) 7点由 mbitsnbites 2小时前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

In this article I will describe a technique that can often compress the chroma data of an image to less than 0.5 bits per pixel on average, without visible artifacts.

What is chroma?

A common technique in the field of image and video compression is to convert RGB images to YCrCb before compression. Instead of representing color as red, green and blue channels, we represent them as luminance (Y) and chrominance (Cr and Cb) channels.

One of the main advantages is that Y contains most of the information, while the chroma channels can be represented in a lower resolution without the human eye detecting it (something that was actually used as early as in the 1930’s-1960’s for color television, where the chroma channels occupied less signal bandwidth than the luma channel).

Note: Throughout the article we assume that the luma channel (Y) is available when compressing and reconstructing the chroma data. The luma information would typically be compressed in some other clever manner (e.g. using a frequency transform like the DCT), but we ignore the details of how that is done in this article.

Key observation: Chroma is similar to luma

If you look at the chroma channels for a color image (like the one above), you will notice that the chroma channels are quite similar to the luma channel. For instance objects and edges appear in the same places, and fine patterns in the chroma channels are also present in the luma channel. I.e. we should be able to derive the chroma channels from the luma channel, using some clever mapping function.

Approximating chroma from luma

Finding a good and accurate mapping function that works for the entire image would be very hard, but if we reduce the problem to smaller sub-blocks of the image (e.g. 16×16 pixels), the problem becomes easier.

Here we use a linear approximation of how the chroma value depends on the luma value, such that:

C(Y) = a * Y + b

E.g. in the example below we use a least squares fit for a single 16×16 block (256 values) and get the line a=0.6504, b=124.8751.

Once a and b are known, an efficient way to encode the linear approximation is to use the two values:

C₁ = a * Y_min + b
C₂ = a * Y_max + b

Where Y_min and Y_max are the minimum and maximum values of Y in the block, respectively. That describes the line between the two points (Y_min, C₁) and (Y_max, C₂). In our example above Y_min = 47, Y_max = 80, C₁ = 155.4 and C₂ = 176.9 (as can be seen in the graph). Thus we can reconstruct the chroma values with:

C(Y) = (Y – Y_min) * (C₂ – C₁) / (Y_max – Y_min) + C₁

(Note that we can calculate Y_min and Y_max from the luma channel at the time of decompression, so we do not have to store those values in the compressed data stream).

If we represent the values C₁ and C₂ with eight bits each, we only need to store 16 bits per block per chroma channel, or a total of 32 bits (four bytes) per block for both the Cr and Cb channels.

Variable block sizes

The simple linear approximation only works well under certain conditions, and this method approximates some blocks better than others. We want to use large blocks for optimal compression, but we may need to use small blocks in parts of the image where the chroma channel is poorly approximated using a linear function.

One way to tackle this is to use varying block sizes. We can simply split a large block into four smaller blocks if we find that the approximation error is too large (according to some configured quality threshold). And we can do that recursively in several steps as needed.

We only need to store one bit of extra information per block (or sub-block), i.e.:

0 = Do not split block.
1 = Split into four sub-blocks.

I ended up using the block sizes 16×16, 8×8, 4×4 and 2×2 (note that for the smallest block size we do not need to store any “split” bit). I also used fewer bits to represent the chroma values in the smallest block size (6 bits instead of 8), since small blocks have a poor compression ratio and are usually only necessary in high frequency parts of the image (edges etc) where accuracy matters less.

If you look closer, note that even in areas where there are distinct chroma details, large blocks can work well thanks to the luma-to-chroma mapping (from here on, images labeled “chroma only” show the combined Cr and Cb channels with Y set to constant 128):

Dealing with block artifacts

One side effect of the linear approximation is that the blocks may become apparent in the decoded image. This is mostly an issue when we allow a large approximation error (i.e. for aggressive compression). For instance gradients and sharp edges may appear to be “blocky”.

Figure 6: “Blocky” artifacts apparent when aggressive compression is used.

One solution to this is to estimate the error around the edges of each block, based on the edges of neighboring blocks, and subtract the interpolated error value from the values in the block. This is easiest explained with a one dimensional example:

Figure 7: Edge error estimation and compensation using linearly interpolated error offset across the block.

It becomes trickier in 2D, and finding a good interpolation function is a matter of tuning. I ended up using linear interpolation in the X and Y directions, and then applied a weighting function that prioritizes the interpolation along the X axis when the pixel is closer to the left/right edges, and the interpolation along the Y axis when the pixel is closer to the top/bottom edges.

The result of the attempted error compensation is that the block edges become less visible, but the downside is that colors may appear smudged (colors tend to “leak”). However, the result is usually more pleasing to the eye as our vision is more sensitive to sharp edges than blurred out chroma.

Figure 8: Large blocks have been smoothed, reducing the “blockiness”.

A nice property of this solution is that it requires no extra information in the compressed data stream. However, since we are just guessing what the edge error is, the algorithm can actually produce some undesired artifacts. For small block sizes and/or high resolution images, this is usually not noticeable unless you zoom in on the artifacts.

Results

Here are some example images at different compression levels. The figures 0.5 bpp etc denote how many bits per pixel are used for chroma information, on average over the whole image (keep in mind that it does not include the luma information).

Original: The original, uncompressed image. Effectively 16 bpp (8 bits per chroma channel).
0.5 bpp (3.1% of the original size): Usually no visual artifacts (although watch the color in the face of the fire breather closely).
0.2 bpp (1.3% of the original size): Some colors may be slightly washed out in some areas (see the motorbikes and the tree).
0.13 bpp (0.8% of the original size): Maximum compression and maximum quality loss (no blocks are split into smaller blocks). Visual artifacts are noticeable. Colors get smeared and washed out.

Click on the thumbnails to see the full image.

The experimental C++ code that implements this algorithm can be found in my yuvlab git repository.

Future improvements

Encoding

It should be possible to improve the compression ratio by using more fine-tuned encoding techniques. For instance, it turns out that the packed chroma data can be further compressed by about 30-50% by using regular entropy coding tools such as gzip or lz4.

It may be possible to exploit similarities between nearby blocks, e.g. by using delta-prediction (similar to how pixel values are predicted in PNG).

It may also be possible to use non-linear coding of chroma values such that small deltas have more accuracy than large deltas, similar to DPCM encoding. For instance, instead of encoding C₁ and C₂, we could encode (C₁+C₂)/2 and (C₂–C₁)/2, where the latter can be treated as a “delta”.

Naturally, we should also explore different lossless data compression algorithms (Huffman, deflate, …).

Deblocking

A problem with the current deblocking algorithm is that it assumes that differences to neighboring blocks at the block edge are errors, even if they are not. This can lead to smoothing artifacts around sharp chroma changes near block edges (e.g. synthetic color bars that line up with block edges).

One improvement could be to disable deblocking for blocks that have a low approximation error. That would unfortunately require us to send extra information about the blocks (e.g. one extra bit per block that tells the decoder to not apply deblocking).

Another possibility is to send a single per-image value that defines the error threshold that the encoder used. It could be used for weighting the deblocking effort, so that for images that were compressed with a strict error threshold we would use less smoothing, and thus reduce/eliminate the artifacts.

Software

There is currently only an experimental tool that implements this algorithm. It is not intended to be used for actual image compression. For instance a library with a clean API and an example tool that creates compressed images would be more useful.

An interactive Web tool would also be neat.

Disclaimer

The algorithm described in this article was developed from scratch by means of experimentation and trial and error (with no help from AI, not even for coding assistance). I have no idea if this algorithm is novel or not, but it is based on separately well known techniques (such as color space conversions, linear regression, adaptive refinement and interpolation).