使用 E-图 专精 Python
Specializing Python with E-Graphs

原始链接: https://vectorfold.studio/blog/egglog

这段文字描述了一个使用egglog库进行项重写和优化的基础Python表达式编译器的创建过程,最终编译成MLIR。 编译器首先将Python函数解析成表达式树,然后使用带有e-graph的等式饱和度来探索基于重写规则(例如数学恒等式)的等效表达式形式。成本模型指导提取最有效的形式。然后,编译器将这个优化的表达式转换成MLIR代码,进一步降低到LLVM IR。llvmlite库动态加载并执行编译后的LLVM代码。 该编译器使用自定义表达式模型来表示数学运算,处理函数解释和参数类型注释,并包含一个作为用户界面装饰器的调度器,用于编译和执行带有NumPy数组的Python函数。这个项目演示了使用e-graph进行编译器优化的核心概念,为构建更复杂的MLIR编译器奠定了基础。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 使用 E-Graphs 特化 Python (vectorfold.studio) dtseng123 2小时前 7 分 | 隐藏 | 过去 | 收藏 | 讨论 加入我们,参加 6 月 16-17 日在旧金山举办的 AI 初创公司学校! 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:

原文

We've explored progressively more sophisticated techniques for optimizing numerical computations. We started with basic MLIR concepts, moved through memory management and linear algebra, and then neural network implementations. Each layer has added new capabilities for expressing and optimizing computations. Now we're reading to build our first toy compiler for Python expressions.

In this section, we'll explore how to use the egglog library to perform term rewriting and optimization on Python expressions and compile them into MLIR.

The entire source code for this section is available on GitHub.

Equality Saturaiton and E-Graphs

Before we dive into the implementation, let's review the key concepts of equality saturation and e-graphs.

Take as an example if we have the rewrites.

  1. x * 2x << 1
  2. x*y/xy

And we try to apply it over the expression (a * 2)/2 becomes (a << 1)/2. However we should have cancelled the 2 in the numerator and denominator and got a which results in a simpler expression. The order of rewrites is important and we want to find an optimal order of rewrites that reduces the expression to a form according to a cost function. This is called the phase ordering problem.

The egg library employs an approach that involves exhaustively applying all possible rewrites to an expression, effectively addressing the phase ordering problem through the use of an e-graph. This approach allows for the exploration of all possible rewrites, followed by the extraction of the most optimal form of the expression.

In linear algebra for example, matrix operations with NumPy like transpose, multiplication, are quite expensive because they involve touching every element of the matrix. But there is a wide range of identities that can be applied to reduce the number of operations.

Compilers like LLVM and even the linalg dialect of MLIR doesn't know about these identities and so can't necessarily abstract away the expensive operations by applying rewrites. However at a high-level (our core language) we can use e-graph to produce much more efficient tensor manipulation operations before lowering them into MLIR.

For example, the following identities are quite common in linear algebra:

(AB)T=BTAT(A B)^T = B^T A^T
联系我们 contact @ memedata.com