Sutton和Barto书的实现

Sutton和Barto书的实现
Sutton and Barto book implementation

这个Python包提供了Sutton的“强化学习：入门”一书中强化学习算法的实用实现。它提供了涵盖各种主题的模块化代码，重点是与用户自定义状态、动作和转移函数（定义状态转移、奖励和回合终止）一起工作的无模型求解器。代码库包含例如示例5.5之类的例子，演示了具有单一状态、无限方差设置的离策略蒙特卡洛方法（普通和加权）。它还展示了用于解决迷宫问题的蒙特卡洛树搜索（MCTS），并完整地可视化了搜索树。虽然功能齐全，但代码优先考虑清晰度而不是优化，并鼓励社区贡献以改进。它为学习强化学习和从头实现书中算法的个人提供了一个有用的起点。请注意，此代码并非用于生产环境。

一篇 Hacker News 的帖子重点介绍了用户 ivanbelenky 实现的 Sutton 和 Barto 的《强化学习：入门》一书中的算法。原帖链接到 GitHub 上的实现代码。评论者，包括 mark_l_watson，对该实现给予了赞扬，并提供了额外的资源，包括官方的 Lisp 和 Python 示例，其他包含强化学习实现的 GitHub 仓库链接，以及 White & White 教授的 Coursera 专业课程。ivanbelenky 谦虚地承认这段代码的探索性性质以及缺乏严格的测试。这些评论为学习和实现该书中强化学习算法的人们提供了宝贵的资源集合，包括示例、练习以及使用不同框架和库的各种实现。

原文

$ python setup.py install

This repository contains code that implements algorithms and models from Sutton's book on reinforcement learning. The book, titled "Reinforcement Learning: An Introduction," is a classic text on the subject and provides a comprehensive introduction to the field.

The code in this repository is organized into several modules, each of which covers differents topics.

All model free solvers will work just by defining states actions and a trasition function. Transitions are defined as a function that takes a state and an action and returns a tuple of the next state and the reward. The transition function also returns a boolean indicating whether the episode has terminated.

states: Sequence[Any]
actions: Sequence[Any]
transtion: Callable[[Any, Any], Tuple[Tuple[Any, float], bool]]

Single State Infinite Variance Example 5.5

from mypyrl import off_policy_mc, ModelFreePolicy

states = [0]
actions = ['left', 'right']

def single_state_transition(state, action):
    if action == 'right':
        return (state, 0), True
    if action == 'left':
        threshold = np.random.random()
        if threshold > 0.9:
            return (state, 1), True
        else:
            return (state, 0), False

b = ModelFreePolicy(actions, states) #by default equiprobable
pi = ModelFreePolicy(actions, states)
pi.pi[0] = np.array([1, 0])

# calculate ordinary and weighted samples state value functions
vqpi_ord, samples_ord = off_policy_mc(states, actions, single_state_transition,
    policy=pi, b=b, ordinary=True, first_visit=True, gamma=1., n_episodes=1E4)

vqpi_w, samples_w = off_policy_mc(states, actions, single_state_transition, 
    policy=pi, b=b, ordinary=False, first_visit=True, gamma=1., n_episodes=1E4)

Monte Carlo Tree Search maze solving plot

s = START_XY
budget = 500
cp = 1/np.sqrt(2)
end = False
max_steps = 50
while not end:
    action, tree = mcts(s, cp, budget, obstacle_maze, action_map, max_steps, eps=1)
    (s, _), end = obstacle_maze(s, action)

tree.plot()

While the code in this package provides a basic implementation of the algorithms from the book, it is not necessarily the most efficient or well-written. If you have suggestions for improving the code, please feel free to open an issue.

Overall, this package provides a valuable resource for anyone interested in learning about reinforcement learning and implementing algorithms from scratch. By no means prod ready.

Sutton和Barto书的实现 Sutton and Barto book implementation

Sutton和Barto书的实现
Sutton and Barto book implementation