IBM 申请专利，用于“AI 可解释性”的 200 年历史的欧拉数学技术。

IBM 申请专利，用于“AI 可解释性”的 200 年历史的欧拉数学技术。
IBM Patented Euler's 200 Year Old Math Technique for 'AI Interpretability'

原始链接: https://leetarxiv.substack.com/p/ibm-patented-eulers-fractions

## IBM 申请古老数学专利：对“CoFrNets”的批判在Papers With Code关闭后，LeetArxiv 强调了一个令人担忧的问题：IBM 申请了一项关于将一种古老的数论技术——广义连分数——应用于 PyTorch 神经网络的专利。基本上，2021 年论文“CoFrNets”的作者实现了这个数学概念，利用 PyTorch 的自动微分 (`backward()`)，并用诸如“梯子”和“1/z 非线性”等新术语重新命名它。该论文声称连分数是通用逼近器，这是众所周知的，并且很大程度上是重新实现了现有的数学原理。尽管只取得了适度的结果（在波形数据集上达到 61% 的准确率），作者还是为他们的工作申请了专利。这引发了对“专利流氓”的担忧——利用现有知识来获取租金。核心创新并非新颖；它只是在神经网络框架内应用一个众所周知的数学函数。作者们自己也承认，该技术存在梯度消失的问题，这是无穷级数的一个已知局限性——欧拉在 1785 年就指出了这个问题。现在，IBM 可能会对一个基本的数学概念拥有法律主张，影响从数值分析到椭圆曲线建模等领域。代码可在 GitHub 和 Google Colab 上获取。

## IBM 的专利与软件专利辩论 IBM 最近一项关于使用欧拉 200 年前的连分数来改进“人工智能可解释性”的技术的专利申请，在 Hacker News 上引发了关于软件专利有效性和必要性的辩论。许多评论员强烈反对软件专利，认为它们通常显而易见，阻碍创新，并不正当地授予垄断权。一些用户指出，软件最好通过版权来保护，并质疑在通常情况下即使没有专利也会进行披露的情况下，专利的益处。讨论强调了人们对“专利流氓”以及在新的领域应用现有技术时，容易获得专利的担忧。一些人捍卫专利制度，认为它通过奖励投资来激励创新，而另一些人则建议大幅缩短专利期限（例如 2-5 年），以更好地适应软件开发的快速步伐。 RSA 和 FreeType hinting 专利等具体例子被提及，说明了软件专利的潜在好处和缺点。最终，占主导地位的情绪倾向于怀疑论，许多人提倡对软件专利进行重大改革，甚至完全取消。

LeetArxiv is a successor to Papers With Code after the latter shutdown.

Quick Summary

IBM owns the patent to the use of derivatives to find the convergents of a generalized continued fraction.

Here’s the bizarre thing: all they did was implement a number theory technique by Gauss, Euler and Ramanujan in PyTorch and call backward() on the computation graph.

Now IBM’s patent trolls can charge rent on a math technique that’s existed for over 200 years.

As always, code is available on Google Colab and GitHub.

The 2021 paper CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions (Puri et al., 2021) investigates the use of continued fractions in neural network design.

The paper takes 13 pages to assert: continued fractions (just like mlps) are universal approximators.

The authors reinvent the wheel countless times:

They rebrand continued fractions to ‘ladders’.
They label basic division ‘The 1/z nonlinearity’.
Ultimately, they take the well-defined concept of Generalized Continued Fractions and call them CoFrNets.

Authors rename generalized continued fractions. Taken from page 2 of (Puri et al., 2021)

Honestly, the paper is full of pretentious nonsense like this:

The authors crack jokes while collecting rent on 200 years of math knowledge. Taken from page 2

Simple continued fractions are mathematical expressions of the form:

where p_n / q_n is the nth convergent (Cook, 2022).

Continued fractions have been used by mathematicians to:

Approximate Pi (MJD, 2014).
Design gear systems (Brocot, 1861)
Even Ramanujan’s math tricks utilised continued fractions (Barrow, 2000)

Continued fractions are well-studied and previous LeetArxiv guides include (Lehmer, 1931) : The Continued Fraction Factorization Method and Stern-Brocot Fractions as a floating-point alternative.

If your background is in AI, a continued fraction looks exactly like a linear layer but the bias term is replaced with another linear layer.

(Jones, 1980) defines generalized continued fractions as expressions of the form :

written more economically as :

where a and b can be integers or polynomials.

The authors replace the term continued fraction with ‘ladder’ to hide the fact they are reinventing the wheel

The authors simply implement a continued fraction library in Pytorch and call the backward() function on the resulting computation graph.

That is, they chain linear neural network layers and use the reciprocal (not RELU ) as the primary non-linearity.

Then they replace the bias term of the current linear layer with another linear layer. This is a generalized continued fraction.

In Pytorch, their architecture resembles this:

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

class CoFrNet(nn.Module): 
    def __init__(self, input_dim, num_ladders=10, depth=6, num_classes=3, epsilon=0.1):
        super(CoFrNet, self).__init__()
        self.depth = depth
        self.epsilon = epsilon
        self.num_classes = num_classes

        #Linear layers for each step in each ladder
        self.weights = nn.ParameterList([
            nn.Parameter(torch.randn(num_ladders, input_dim)) for _ in range(depth + 1)
        ])

        #Output weights for each class
        self.output_weights = nn.Parameter(torch.randn(num_ladders, num_classes))

    def safe_reciprocal(self, x):
        return torch.sign(x) * 1.0 / torch.clamp(torch.abs(x), min=self.epsilon)

    def forward(self, x):
        batch_size = x.shape[0]
        num_ladders = self.weights[0].shape[0]

        # Compute continued fractions for all ladders
        current = torch.einsum(’nd,bd->bn’, self.weights[self.depth], x)

        # Build continued fractions from bottom to top
        for k in range(self.depth - 1, -1, -1):
            a_k = torch.einsum(’nd,bd->bn’, self.weights[k], x)
            current = a_k + self.safe_reciprocal(current)

        # Linear combination for each class
        output = torch.einsum(’bn,nc->bc’, current, self.output_weights)
        return output

def test_on_waveform():
    # Load Waveform-like dataset
    X, y = make_classification(
        n_samples=5000, n_features=40, n_classes=3, n_informative=10,
        random_state=42
    )

    # Split data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

    # Standardize
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    # Convert to torch tensors
    X_train = torch.FloatTensor(X_train)
    X_test = torch.FloatTensor(X_test)
    y_train = torch.LongTensor(y_train)
    y_test = torch.LongTensor(y_test)

    # Model
    input_dim = 40
    num_classes = 3
    model = CoFrNet(input_dim, num_ladders=20, depth=6, num_classes=num_classes)

    # Training
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    epochs = 100
    batch_size = 64

    for epoch in range(epochs):
        model.train()
        permutation = torch.randperm(X_train.size()[0])

        for i in range(0, X_train.size()[0], batch_size):
            indices = permutation[i:i+batch_size]
            batch_x, batch_y = X_train[indices], y_train[indices]

            optimizer.zero_grad()
            outputs = model(batch_x)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

        # Validation
        if epoch % 10 == 0:
            model.eval()
            with torch.no_grad():
                train_outputs = model(X_train)
                train_preds = torch.argmax(train_outputs, dim=1)
                train_acc = (train_preds == y_train).float().mean()

                test_outputs = model(X_test)
                test_preds = torch.argmax(test_outputs, dim=1)
                test_acc = (test_preds == y_test).float().mean()

            print(f’Epoch {epoch:3d} | Loss: {loss.item():.4f} | Train Acc: {train_acc:.4f} | Test Acc: {test_acc:.4f}’)

    print(f”\nFinal Test Accuracy: {test_acc:.4f}”)
    return test_acc.item()

if __name__ == “__main__”:
    accuracy = test_on_waveform()
    print(f”CoFrNet achieved {accuracy:.1%} accuracy on Waveform dataset”)

Testing on a non-linear waveform dataset, we observe these results:

CoFrNet learns a non-linear dataset

An accuracy of 61%.

Nowhere near SOTA and that’s expected.

Continued fractions are well-studied and any number theorist would tell you the gradients vanish ie there are limits to the differentiability of the power series.

The authors use power series of continued fractions to interpret their moderate success. Taken from page 6 of (Puri et al., 2021)

Even Euler’s original work (Euler, 1785) allude to this fact: it is an infinite series so optimization by differentiation has its limits.

Pytorch’s autodiff engine replaces the differentiabl series with a differentiable computational graph.

The authors simply implemented a continued fraction library in Pytorch and as expected, saw the gradients could be optimized.

As the reviewers note, the idea seems novel but the technique is nowhere near SOTA and the truth is, continued fractions have existed for a while. They simply replace the linear layers of a neural network with generalized continued fractions.

Here’s the bizarre outcome: the authors filed for a patent on their ‘buzzword-laden’ paper in 2022.

Their patent was published and its status marked as pending.

Here’s the thing:

Continued fractions have existed longer than IBM.
Differentiablity of continued fractions is well-known.
The authors did not do anything different from Euler’s 1785 work.

Now, If IBM feels litigious they can sue Sage, Mathematica, Wolfram or even you for coding a 249 year old math technique.