ATLAS：大规模自动形式化教材库

ATLAS：大规模自动形式化教材库
ATLAS: Autoformalized Textbook Library At Scale

原始链接: https://github.com/facebookresearch/atlas-lean

ATLAS 是一个使用 Lean 4 语言编写的大规模自动形式化数学库。该项目由“AutoformBot”流水线生成，涵盖了代数、分析、几何和理论计算机科学等多个学科，翻译了 26 本本科及研究生水平教科书中的定义、陈述和证明。作为一项持续的研究工作，ATLAS 提供了一个可重用的形式化构建块存储库，旨在加速人类和机器驱动的形式化进程。目前的库包含超过 63 万行代码，拥有 46,203 个声明，自动证明成功率达 92.7%。用户可以通过在线可视化工具访问该库，进行浏览、对比非形式化教科书陈述与 Lean 对应项，并检查逻辑依赖关系。每个书籍目录都包含源文件、目标陈述以及针对准确性和代码质量的自动评估指标。由 Ahmad Rammal 等人领导的 ATLAS 团队正持续扩展该语料库，提升可维护性，并使代码与标准的 Mathlib 规范保持一致。他们欢迎外部贡献，以帮助扩展和完善这一日益增长的形式化数学知识集合。

对不起。

ATLAS is a Lean 4 library of textbook mathematics autoformalized by LLMs: informal statements and proofs translated into Lean code. It draws from undergraduate and graduate textbooks across analysis, algebra, geometry, topology, combinatorics, probability, statistics, PDEs, number theory, and theoretical computer science.

The goal of ATLAS is to provide reusable formal building blocks for future human- and machine-driven Lean formalization. This is an active effort: we are continuing to scale to more sources, curate the generated material, improve coverage and maintainability, and move the library closer to Mathlib conventions.

ATLAS was generated with AutoformBot, our autoformalization pipeline.

Each book directory under Atlas/ contains:

Lean source files for the generated definitions, statements, and proofs.
targets.yaml, listing the textbook statements selected for formalization.
report.json, containing automated evaluation results for matched Lean declarations, including faithfulness, proof-integrity, and code-quality scores.

A visualizer is provided at https://rammalahmad.github.io/atlas/. It allows users to browse ATLAS, compare informal statements with their Lean formalizations, inspect logical dependency graphs between results, and extract the Lean code needed to state a selected theorem.

ATLAS is an ongoing, machine-generated extension effort rather than a finished product. We are actively working on scaling the corpus, curating the generated code, formalizing remaining statements, and improving idiomatic Mathlib reuse. External contributions are welcome!

To build the full library with the pinned Lean and Mathlib versions, run:

26 books · 630,999 lines of code · 483,917 lines of Lean code (excl. comments/blanks) · 46,203 declarations · 42,837 proved (92.7%) · 2,855 / 4,007 statements formalized (71.3%) · 183,157M tokens

Book	Target Statements	Formalized	% Formalized	Lines of Code	Lines of Lean	Declarations	Proved	% Proved	Tokens (M)
AlgebraNotes	176	151	85.8%	5,037	4,409	274	261	95.3%	1,962.99
AlgebraicCombinatorics	39	37	94.9%	10,695	9,343	737	734	99.6%	1,440.73
AlgebraicGeometryI	186	112	60.2%	40,678	27,393	4,499	4,210	93.6%	7,629.26
AlgebraicTopologyI	171	110	64.3%	29,154	20,142	2,416	2,063	85.4%	10,323.27
AnAlgorithmistsToolkit	158	131	82.9%	9,656	8,234	712	668	93.8%	2,004.00
ArithmeticGeometry	335	266	79.4%	39,257	29,573	3,047	2,861	93.9%	11,100.62
BooleanFunctions	108	44	40.7%	9,516	7,949	667	614	92.1%	2,327.49
Buildings	74	44	59.5%	64,383	48,809	4,345	4,247	97.7%	20,442.93
CombinatorialOptimization	36	22	61.1%	8,908	7,934	428	414	96.7%	2,475.65
ComplexVariables	38	37	97.4%	7,231	6,225	285	280	98.2%	1,250.91
DifferentialAnalysis	113	88	77.9%	31,302	23,713	1,634	1,506	92.2%	11,743.27
DifferentialGeometry	147	112	76.2%	10,592	8,942	888	781	88.0%	1,933.97
EllipticCurves	360	212	58.9%	32,819	22,316	3,483	2,981	85.6%	11,058.00
FourierAnalysis	38	34	89.5%	7,943	6,671	373	359	96.2%	1,185.90
GeometryOfManifolds	72	40	55.6%	22,686	16,408	3,251	3,098	95.3%	6,864.93
HighDimensionalStatistics	73	65	89.0%	39,656	31,715	1,564	1,518	97.1%	975.36
IntroductionToFunctionalAnalysis	72	68	94.4%	2,709	2,006	113	109	96.5%	553.64
IntroductionToPartialDifferentialEquations	105	86	81.9%	27,666	20,740	1,585	1,414	89.2%	2,972.23
LieGroups	185	74	40.0%	60,285	50,594	4,219	3,814	90.4%	45,384.33
NumberTheoryI	576	460	79.9%	64,958	54,760	3,764	3,591	95.4%	15,424.36
ProbabilisticMethodsInCombinatorics	210	109	51.9%	20,555	15,604	1,272	1,089	85.6%	2,720.15
ProjectionTheory	111	73	65.8%	13,357	9,672	979	871	89.0%	2,678.00
RealAnalysis	177	175	98.9%	2,886	2,224	149	147	98.7%	585.64
TensorCategories	229	137	59.8%	42,812	29,729	3,373	3,176	94.2%	11,338.45
TheoryOfComputation	118	84	71.2%	15,094	10,581	1,553	1,482	95.4%	3,580.36
TheoryOfProbability	100	84	84.0%	11,164	8,231	593	549	92.6%	3,200.61
Total	4,007	2,855	71.3%	630,999	483,917	46,203	42,837	92.7%	183,157

The initial ATLAS effort was led by Ahmad Rammal, Niket Patel, Fabian Gloeckle, Amaury Hayat, Julia Kempe, Remi Munos, Charles Arnal, and Vivien Cabannes.

If you find this work useful, please cite our paper:

@misc{rammal2026formalizingmathematicsscale,
      title={Formalizing Mathematics at Scale}, 
      author={Ahmad Rammal and Niket Patel and Fabian Gloeckle and Amaury Hayat and Julia Kempe and Remi Munos and Charles Arnal and Vivien Cabannes},
      year={2026},
      eprint={2605.29955},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.29955}, 
}

ATLAS：大规模自动形式化教材库 ATLAS: Autoformalized Textbook Library At Scale

ATLAS：大规模自动形式化教材库
ATLAS: Autoformalized Textbook Library At Scale