卷积、多项式和翻转核

卷积、多项式和翻转核
Convolutions, Polynomials and Flipped Kernels

原始链接: https://eli.thegreenplace.net/2025/convolutions-polynomials-and-flipped-kernels/

多项式乘法和卷积有着深刻的联系。多项式乘法可以看作卷积的一个特例。在多项式乘法中，我们通过对原始多项式系数的乘积求和来确定每一项的系数。这个过程与卷积计算线性时不变（LTI）系统输出的方式如出一辙。 LTI系统的输出是输入信号与其冲激响应的卷积。卷积和涉及到翻转其中一个序列，然后计算元素级乘积的和，同时翻转的序列沿着另一个序列滑动。这种“翻转”和求和的过程与寻找多项式系数的过程相同，其中求和中的索引方向相反。傅里叶变换通过卷积定理进一步联系了这些概念：卷积的傅里叶变换等于各个傅里叶变换的简单乘积。这一性质使得可以使用FFT等算法进行高效的卷积计算。

This Hacker News thread discusses a blog post about convolutions, polynomials, and flipped kernels. Commenters delve into related mathematical concepts. One highlights that the distribution of the sum of two random variables is the convolution of their probability mass functions (PMFs) or probability density functions (PDFs). Another notes the MAX operator on random variables corresponds to the PRODUCT operator on their distributions, useful in estimating completion times of parallel jobs. Discussion extends to generating functions as a tool for solving combinatorial problems, referencing Herbert Wilf's book "Generatingfunctionology." Some mention the importance of independence assumptions in probability theorems and potential issues when those assumptions are violated. They also point out the correspondence between Max/Plus operations on random variables and product/convolution operations on their distributions, respectively.

（评论） 2024-03-13

动画人工智能 2023-10-25

（评论） 2025-04-22

（评论） 2024-06-06

原文

This is a post about multiplying polynomials, convolution sums and the connection between them.

Multiplying polynomials

Suppose we want to multiply one polynomial by another:

This is basic middle-school math - we start by cross-multiplying:

And then collect all the terms together by adding up the coefficients:

Let's look at a slightly different way to achieve the same result. Instead of cross multiplying all terms and then adding up, we'll focus on what terms appear in the output, and for each such term - what its coefficients are going to be. This is easy to demonstrate with a table, where we lay out one polynomial horizontally and the other vertically. Note that we have to be explicit about the zero coefficient of x in the second polynomial:

The diagonals that add up to each term in the output are highlighted. For example, to get the coefficient of in the output, we have to add up:

(if the second polynomial had a term, there would be another component to add)

For what follows, let's move to a somewhat more abstract representation: a polynomial P can be represented as a sum of coefficients multiplied by corresponding powers of x :

Suppose we have two polynomials, P and R. When we multiply them together, the resulting polynomial is S. Based on our insight from the table diagonals above, we can say that for each k:

And then the entire polynomial S is:

It's important to understand this formulation, since it's key to this post. Let's use our example polynomials to see how it works. First, we represent the two polynomials as sequences of coefficients (starting with 0, so the coefficient of the constant is first, and the coefficient of the highest power is last):

In this notation, is the first element in the list for P, etc. To calculate the coefficient of in the product:

Expanding the sum for all the non-zero coefficients of P:

Similarly, we'll find that , and so on, resulting in the final polynomial as before:

There's a nice graphical approach to multiply polynomials using this idea of pairing up sums for each term in the output:

Graphical representation of poly mul, part 1

We start with the diagram on the left: one of the polynomials remains in its original form, while the other is flipped around (constant term first, highest power term last). We line up the polynomials vertically, and multiply the corresponding coefficients: the constant coefficient of the output is just the constant coefficient of the first polynomial times the constant coefficient of the second polynomial.

The diagram on the right shows the next step: the second polynomial is shifted left by one term and lined up vertically again. The corresponding coefficients are multiplied, and then the results are added to obtain the coefficient of x in the output polynomial.

We continue the process by shifting the lower polynomial left:

Graphical representation of poly mul, part 2

Calculating the coefficients of and then . A couple more steps:

Graphical representation of poly mul, part 3

Now we have all the coefficients of the output. Take a moment to convince yourself that this approach is equivalent to the summation shown just before it, and also to the "diagonals in a table" approach shown further up. They all calculate the same thing !

Hopefully it should be clear why the second polynomial is "flipped" to perform this procedure. This all comes down to which input terms pair up to calculate each output term. As seen above:

While the index i moves in one direction (from the low power terms to the high power terms) in P, the index k-i moves in the opposite direction in R.

If this procedure reminds you of computing a convolution between two arrays, that's because it's exactly that!

Signals, systems and convolutions

The theory of signals and systems is a large topic (typically taught for one or two semesters in undergraduate engineering degrees), but here I want to focus on just one aspect of it which I find really elegant.

Let's define discrete signals and systems first, restricting ourselves to 1D (similar ideas apply in higher dimensions):

Discrete signal: An ordered sequence of numbers with integer indices . Can also be thought of as a function .

Discrete system: A function mapping input signals to output signals . For example, is a system that scales all signals by a factor of two.

Here's an example signal:

This is a finite signal. All values not explicitly shown in the chart are assumed to be 0 (e.g. , , and so on).

A very important signal is the discrete impulse:

In graphical form, here's , as well as a couple of time-shifted variants of it. Note how we shift a signal left and right on the horizontal axis by adding to or subtracting from n, correspondingly. Take a moment to double check why this works.

Discrete impulse function delta with shifts

The impulse is useful because we can decompose any discrete signal into a sequence of scaled and shifted impulses. Our example signal has three non-zero values at indices 0, 1 and 2; we can represent it as follows:

More generally, a signal can be written as:

(for all k where is nonzero)

In the study of signals and systems, linear and time-invariant (LTI) systems are particularly important.

Linear: suppose is the output of a system for input signal , and similarly is the output for . A linear system outputs for the input where a and b are constants.

Time-invariant: if we delay the input signal by some constant: , the output is similarly delayed: . In other words, the response of the system has a similar shape, no matter when the signal was received (it behaves today similarly to the way it behaved yesterday).

LTI systems are important because of the decomposition of a signal into impulses discussed above. Suppose we want to characterize a system: what it does to an arbitrary signal. For an LTI system, all we need to describe is its response to an impulse!

If the response of our system to is , then:

Its response to is , for any constant c, because the system is linear.
Its response to is , for any time shift k, because the system is time-invariant.

We'll combine these and use linearity again (note that in the following are just constants); the response to a signal decomposed into a sum of shifted and scaled impulses:

Is:

This operation is called the convolution between sequences and , and is denoted with the operator:

Let's work through an example. Suppose we have an LTI system with the following response to an impulse:

The response has , and zeros everywhere else. Recall our sample signal from the top of this section (the sequence 2, 2, 1). We can decompose it to a sequence of scaled and shifted impulses, and then calculate the system response to each of them. Like this:

Decomposed x[n] and the h[n] for each component

The top row shows the input signal decomposed into scaled and shifted impulses; the bottom row is the corresponding system response to each input. If we carefully add up the responses for each n, we'll get the system response to our input:

Now, let's calculate the same output, but this time using the convolution sum. First, we'll represent the signal x and the impulse response h as sequences (just like we did with polynomials), with first, then etc:

The convolution sum is:

Recall that k ranges over all the non-zero elements in x. Let's calculate each element of (noting that h is nonzero only for indices 0 and 1):

All subsequent values of y are zero because k only ranges up to 2 and .

If you look carefully at this calculation, you'll notice that h is "flipped" in relation to x, just like with the polynomials:

Convolution between signals by flipping one and sliding

We start with (black) and flipped (blue), and line up the first non-zero elements. This computes
In subsequent steps, is slides right, one element at a time, and the next value of y is computed by adding up the element-wise products of the lined up x and h.

Just like with polynomials , the reason why one of the inputs is flipped is clear from the definition of the convolution sum, where one of the the indices increases (k), while the other decreases (n-k).

Properties of convolution

The convolution operation has many useful algebraic properties: linearity, associativity, commutativity, distributivity, etc.

The commutative property means that when computing convolutions graphically, it doesn't matter which of the signals is "flipped", because:

And therefore:

But the most important property of the convolution is how it behaves in the frequency domain. If we denote as the Fourier transform of signal f, then the convolution theorem states:

The Fourier transform of a convolution is equal to simple multiplication between the Fourier transforms of its operands. This fact - along with advanced algorithms like FFT - make it possible to implement convolutions very efficiently.

This is a deep and fascinating topic, but we'll leave it as a story for another day.