IVF makes search fast by skipping most of the database, but leaves every vector uncompressed. One billion SIFT descriptors still cost 512 GB of RAM. Product Quantization (PQ), introduced by Jégou, Douze and Schmid (2011), is the compression trick FAISS builds on to shrink each vector to 8 bytes while keeping distance estimates meaningful.
Same centroids, a different job
§4 used centroids to partition the search space. PQ uses them to compresseach vector. Same K-Means math, new purpose: the centroid's index becomes the vector. If your codebook has centroids, any vector collapses to one integer in .
A pixel-sized analogy
A 24-bit RGB pixel can be any of 16.7 million colors. A GIF gives up some of that range: it picks a 256-color palette and replaces every pixel with an 8-bit index into it. Storage drops 3×, the picture still looks like the picture. That palette is a codebook; the index is a code.
A 128-D SIFT descriptor is just a very long pixel. We want the same trick: find a palette of representative vectors, replace every descriptor with its nearest palette index.
How many bits does an index cost?
To label centroids uniquely you need enough bits to distinguish all of them. Each bit doubles the number of labels, so the code length is bits. Concretely:
- centroids → bits per code (1 byte)
- centroids → bits
- centroids → 64 bits (8 bytes)
More centroids means the quantizer discriminates finer detail, and the code grows accordingly. The codebook designer is playing a bit-budget game.
The paper's aggressive target
Jégou et al. want each 128-D SIFT descriptor to become a 64-bit code, which is 0.5 bit per dimension, a 64× compression ratio against the 512-byte original. Half a bit per dimension is ambitious: it means the quantizer has to encode all the variation along each axis with a single binary-ish choice.
Hitting 64 bits with one flat codebook means picking centroids. That is roughly 18 quintillion, more than the number of grains of sand on Earth. Let's feel that number before we dismiss it.
Flat codebook feasibility
Pick a vector dimension and a bit budget per dimension to see the centroid count and storage footprint a single flat quantizer would need.
Vector dimension (D)
Bits per dimension
Total bits per code64bits
Centroids needed (k = 2^64)1.84 × 10^19centroids
Codebook storage9.44ZB
Impossible
Bigger than all cloud storage on Earth. Lloyd's algorithm would need roughly 5.53 × 10^20 training samples.
Why 264 is not a codebook
Three walls hit simultaneously when gets that large:
- Storage.Each centroid is 128 floats × 4 bytes = 512 bytes. Times centroids gives about 9.4 zettabytes, more than all cloud storage on Earth in 2025. You cannot persist the codebook, let alone page through it at query time.
- Training data.Lloyd's algorithm needs at least tens of samples per centroid to converge. That is training vectors. No dataset in existence is that large.
- Query cost.To encode one descriptor you compare it to every centroid. 18 quintillion distance computations per vector is not a latency you can ship.
The flat codebook dies on all three fronts. The paper's move is to keep the effective vocabulary the same size while shrinking the stored codebook by many orders of magnitude.
The factoring trick
Back to the GIF analogy. Instead of finding one 256-color palette that works for the whole image, cut the image into 8 tiles and give each tile its own 256-color palette. Each tile is now one byte; the image is 8 bytes; every tile had access to a full 256-color palette tuned to its own pixels. PQ does exactly this for vectors.
Formally: split into sub-vectors and learn one small sub-quantizer per block. The product quantizer is the tuple of their outputs:
With and centroids per block, the effective codebook has possible combinations (same order of magnitude as the impossible flat case), but we only store centroid vectors in each. Total codebook storage is (paper §II.B, Table I), small enough to live in L2 cache.
Each encoded vector fits in , one byte per block index. Ratio: 512 B to 8 B, a 64× drop in memory, with a codebook that actually fits on the machine.
The walkthrough below builds the full index on the paper's reference SIFT setting (), turning each concept above into a concrete operation on real numbers.