TycoonLE：一个用于长程规划的 Jax 强化学习环境

TycoonLE：一个用于长程规划的 Jax 强化学习环境
TycoonLE: A Jax reinforcement learning environment for long-horizon planning

原始链接: https://github.com/vrtnis/tycoon-learning-environment

**大亨学习环境 (TycoonLE)** 是一个专为基于经济基础的长程规划而设计的强化学习平台。在模拟的物流经济中，智能体必须掌握资本配置、路线构建、货物管理和债务融资，以优化延迟回报。 TycoonLE 为高性能研究而构建，其固定形状的接口兼容 JAX 转换（如 `jit`、`vmap`、`scan`），支持高效的大规模训练。该环境强调动作合法性、融资时机和程序化变体等决策挑战。主要特性包括： * **可审查性：** 基于浏览器的用户界面允许用户通过可重放的 JSON 追踪文件，可视化并审计策略行为、货物流量及盈利能力。 * **基准测试：** 性能可通过配套的 TycoonBench 套件进行衡量。 * **易用性：** 该框架支持直接安装，并包含训练 PPO 智能体和运行测试套件的示例。 TycoonLE 专为关注复杂规划的研究人员设计，提供了在经济约束下测试智能体决策所需的工具。如需了解详细基准测试或参与贡献，请访问 [vrtnis.github.io/tycoonbench](https://vrtnis.github.io/tycoonbench)。

Hacker News 最新 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交登录 TycoonLE：用于长视距规划的 Jax 强化学习环境 (github.com/vrtnis) 3 点，由 vrtnis 于 2 小时前发布 | 隐藏 | 往期 | 收藏 | 1 条评论 vrtnis 2 小时前 [–] 受 OpenTTD 启发，智能体可在运输经济中构建路线、运输货物、管理债务并优化延迟回报。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

Tycoon Learning Environment (TycoonLE) is a reinforcement learning environment for economically grounded, long-horizon planning. Agents operate in a simulated logistics economy where they allocate capital, build transport routes, move cargo, manage debt, and optimize delayed returns.

It is designed to study action legality, candidate-frontier decision interfaces, financing timing, delayed rewards, procedural variation, and replayable audit traces.

TycoonLE uses a fixed-shape interface. Agents choose among valid route, finance, and wait candidates, making rollouts compatible with JAX transformations such as jit, vmap, and scan.

The replay UI makes policies inspectable through route choices, cargo flow, financing behavior, reward, score, and profit over time.

TycoonBench provides a companion benchmark report for comparing agent and model performance on TycoonLE planning tasks: vrtnis.github.io/tycoonbench.

Use Python 3.11 or 3.12:

py -3.12 -m venv .venv
.\.venv\Scripts\python.exe -m pip install -e ".[test]"
npm install

import jax
from tycoonle_jax import TycoonLE

env = TycoonLE(split="dev", family="chain")
state, timestep = env.reset(jax.random.PRNGKey(0))
action = timestep.observation.action_mask.argmax()
state, timestep = env.step(state, action)

Export a replay:

.\.venv\Scripts\python.exe examples\quickstart.py
npm run dev

Open the browser UI and load runs/quickstart/replay.json.

Run tests:

.\.venv\Scripts\python.exe -m pytest
npm run build

Run a small PPO smoke train:

.\.venv\Scripts\python.exe examples\train_ppo_jax.py --updates 1 --num-envs 4 --rollout-length 4 --update-epochs 1 --hidden-sizes 32

If you find this work useful, consider citing:

@software{tycoonle,
  title = {TycoonLE},
  author = {TycoonLE contributors},
  year = {2026},
  url = {https://github.com/vrtnis/tycoon-learning-environment}
}

TycoonLE uses sprite artwork from OpenGFX, an open-source graphics base set for OpenTTD.

TycoonLE：一个用于长程规划的 Jax 强化学习环境 TycoonLE: A Jax reinforcement learning environment for long-horizon planning

TycoonLE：一个用于长程规划的 Jax 强化学习环境
TycoonLE: A Jax reinforcement learning environment for long-horizon planning