闸门：AI与自动化场景探索器

闸门：AI与自动化场景探索器
Gate: AI and Automation Scenario Explorer

GATE模型是一个综合评估模型，旨在探索人工智能发展对经济的影响。它关注这样一个反馈循环：人工智能投资驱动计算能力的提升，从而导致任务自动化，提高产出，并进一步推动人工智能投资。该模型由三个模块组成，并使用一个游乐场来修改参数和观察场景结果。关键参数与人工智能/计算、技术研发和一般经济学相关。附加组件考虑了投资者不确定性、劳动力重新配置和研发外部性。 GATE的预测是条件预测，而非具体的预测结果，它突出了由计算和算法驱动的AI自动化的定性动态。定量预测的可靠性较低。关键参数包括AGI训练需求、FLOP差距比例和研发回报。建议用户通过调整参数来检查不直观的结果，并将错误报告给[email protected]。该模型关注的是完全自动化的路径，而非其后的后果。该模型考虑了一个抽象的任务空间，假设存在一系列需要越来越多的计算能力才能实现自动化的任务。初始自动化水平是根据当前分配给运行时计算的全球生产总值（GWP）来校准的。

Hacker News 最新 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Gate：AI 和自动化场景探索器 (epoch.ai) 4 分 surprisetalk 1小时前 | 隐藏 | 往期 | 收藏 | 讨论加入我们 6 月 16-17 日在旧金山举办的 AI 初创公司学校！指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们搜索：

人工智能的价值主要来自广泛的自动化，而非研发。 2025-03-22

（评论） 2024-02-07

可视化 AI 模型随时间的训练成本 2024-06-06

人工智能泡沫威胁着硅谷，也威胁着我们所有人。 2025-03-25

（评论） 2024-02-01

原文

Overview

The Growth and AI Transition Endogenous (GATE) model is an integrated assessment model of the impact of AI development on the economy.

To accompany our paper describing the GATE model in detail, we have developed a playground that allows interested readers to modify parameter settings and observe the model’s behavior in a wide range of scenarios.

In this documentation, we describe the following:

A concise summary of how the GATE model is structured, implemented, and solved.
How to use the GATE playground, and interpret its predictions.
Explanations for playground’s default parameter settings.

If you would like to ask any questions or provide feedback about the GATE playground, you may contact us at [email protected]. You can also read our accompanying blogpost for an overview of the key results suggested by the model.

Model structure

The core dynamic in GATE is an automation feedback loop: Investments drive increases in the computation used to train and deploy increasingly capable AI systems, which in turn leads to the gradual automation of tasks currently performed by humans. This in turn increases output, which makes additional resources available for further investments into AI development. The model consists of three modules as described in the figure below:

How to interpret the GATE model’s predictions

To use the GATE playground most effectively, it’s important to understand how its predictions are meant to be interpreted, and what the model’s limitations are. Notably, the model’s predictions are not meant to represent Epoch AI’s forecasts of future AI developments and economic impacts. As with any economic model, the GATE model’s predictions are instead conditional forecasts, depending on a range of assumptions both in terms of specifications and in terms of parameter values.

In particular, GATE is most useful for analyzing the high-level qualitative dynamics of AI automation, assuming that AI capabilities improvements are solely driven by increases in physical computation and better algorithms. Thus, GATE can be used for deriving stylized facts about the economic impacts of AI automation – in contrast, its quantitative predictions are substantially more uncertain and unreliable.

It is important to note that GATE predictions may be subject to optimization errors. GATE is a complex economic model with a large number of parameters, so the model’s predictions become unreliable for certain parameter ranges due to optimization issues. It is especially important to verify if the model’s predictions are simply due to optimization problems when results are unintuitive. For example, one approach to checking this is to slightly perturb parameter settings to see if results change substantially. If you identify any such bugs, please email [email protected].

Parameters

In this documentation we provide a brief description of each parameter – for more details (including the reasoning for individual parameter estimates), please refer to our technical paper.

We split the GATE model’s parameters into three categories:

AI and compute: Parameters pertaining to the stock of effective compute, as well as to the automation of tasks
Technological and R&D: Parameters relating to hardware and software R&D
General economics: Parameters that are commonly used in macroeconomic models

Additionally, we outline the three main add-ons to the core model, which account for investor uncertainty, labor reallocation (or lack thereof), and positive externalities in R&D.

AI and compute parameters

Automation parameters

Runtime compute parameters

Compute investment parameters

Technological and R&D parameters

Hardware R&D parameters

Software R&D parameters

General economics parameters

FAQ

What is a FLOP?

A “FLOP” stands for a “FLoating-point OPeration”. This is a unit of computation commonly used when talking about training or running AI systems. Note that this is not the same as FLOP/s (often written as FLOPS), which refers to the number of FLOP per second.

What is effective compute?

“Effective compute” is a measure of the amount of computation available for training or running AI systems, adjusted to account for algorithmic progress. In particular, due to algorithm improvements, it becomes possible to do more with the same amount of computation, which “in effect” has the same impact as increasing the number of computations performed. We measure effective compute in units of effective FLOP (eFLOP), relative to the start of the simulation.“

What is effective labor?

The effective labour is the aggregation of the labour contributions of AI and human workers. It roughly corresponds to the amount of human workers that would be needed to produce the same output without AI. It is described in depth in the Production section

What if some tasks remain unautomated?

By default, we assume that all tasks are automatable given enough effective compute, although this is possible in principle. The beliefs add-on can be used to specify scenarios where full automation is not achievable (e.g. by placing 100% probability on a maximum task automation fraction of 0.8, instead of a max fraction of 1). As long as the fraction of tasks that can be automated is sufficiently large, we still generally observe similar results in the model dynamics up to full automation.

What are the most important parameters of the model?

In our experience with the model, the parameters that matter the most are the AGI training requirements and FLOP gap fraction—which jointly determine the mapping between effective compute and tasks automated—and the hardware and software returns to R&D—which control how costly it is to scale effective compute. In the future we plan to release a sensitivity analysis that further clarifies which parameters influence most the model outcomes.

How did you estimate the parameters for the default parameter preset?

What do the aggressive and conservative presets correspond to?

They correspond to an example scenario in which we have adjusted some of the model’s most important parameters to more extreme but still plausible values. In particular, we change the values of the AGI training requirements and the hardware and software R&D returns. The rest of the parameters are set to their default value. These presets are meant to showcase the range of scenarios that are possible to illustrate with GATE.

What happens after full automation?

The main focus of GATE is on the dynamics in the leadup towards full automation, and it is likely to make poor predictions about what happens close to and after full automation. For example, in the model the primary value of training compute is in increasing the fraction of automated tasks, so once full automation is reached the compute dedicated to training falls to zero. However, in reality there may be economically valuable tasks that go beyond those that humans are able to perform, and for which training compute may continue to be useful.

Why don’t the model predictions for GWP growth, capital stock, or compute investment match the values of today?

The GATE model uses gradient descent to solve for the optimal allocation of investments and compute, in order to maximize a utility function. The predicted investments near the start of simulations can therefore be quite different from those observed today.

Does GATE take a stance on which tasks will be automated first?

In the model we consider an abstract task space including all economically useful tasks, including both cognitive and physical tasks. However, in GATE we do not take a stance on which tasks are likely to be automated first. Instead, GATE only assumes that there is a fixed spectrum of tasks that require progressively more effective compute at training and inference to be automated. We do not consider bottlenecks inherent to specific tasks, such as physical tasks that require robotics.

Why is the initial fraction of automated tasks nonzero?

We do this so that the initial value of runtime compute is nonzero. Instead, we calibrate the initial fraction based on the current share of GWP that is allocated to runtime compute, which results in approximately 10% of tasks being initially automated.

Acknowledgements

Roles and Contributions

Ege Erdil initiated the project, developed the early prototype, and played a central role in advancing the key theoretical and modeling ideas. Andrei Potlogea contributed significantly to the technical exposition and introduced refinements to the economic model. Tamay Besiroglu coordinated the project, contributed to the writing, and ensured alignment across modeling, engineering, and writing efforts. Anson Ho provided ongoing support throughout the project, including calibration of parameter values, general model refinement, and coordinating external feedback. Jaime Sevilla contributed to both the engineering and writing, ensuring coherence between the model’s implementation and its conceptual framework. Matthew Barnett contributed to the writing and parameter settings.

Engineering and Sandbox Development

Edu Roldan provided technical support on the development of the model. Matej Vrzala contributed to the design of and implemented the interactive sandbox. Andrew Souza supported the implementation of the interactive sandbox. Robert Sandler provided design support, contributing to the usability and presentation of the sandbox interface.

We are grateful to Tyler Cowen, Chad Jones, Ben Golub, Ryan Greenblatt, Kevin Kuruc, Caroline Falkman Olsson, Anton Korinek, Daniel Kokotajlo, Lev McKinney, Daan Jujin, Zachary Brown and Dan Valentine, as well as seminar attendees at the 15th Oxford workshop on Global Priorities Research for their insights and feedback.