Show HN：通过热交换代码将你的PyTorch模型保存在VRAM中

Show HN：通过热交换代码将你的PyTorch模型保存在VRAM中
Show HN: Keep your PyTorch model in VRAM by hot swapping code

原始链接: https://github.com/valine/training-hot-swap/

该系统通过在脚本执行之间保持大型模型加载在VRAM中来实现快速的PyTorch训练迭代，避免了代价高昂的重新加载时间。它通过启动一个持久的后端服务器（`model_server.py`）来托管模型实现这一点。您无需直接运行训练脚本，而是使用`client.py`将其发送到服务器，服务器然后使用`eval()`执行代码。这允许您修改和重新运行训练脚本而无需重新加载模型。它对于远程开发尤其有用，提供了比有问题的远程SSH解释器更流畅的体验。该系统还支持在服务器上传输和运行DearImgui UI代码，从而实现用于监控训练进度的即时GUI更新。要使用它，只需修改您的模型加载，引用全局管理的`model`变量即可。服务器应该启动并保持活动状态，然后使用`client.py`提交要执行的脚本。在端口5678上使用调试服务器运行服务器时，支持IntelliJ调试。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Show HN：通过热交换代码将您的 PyTorch 模型保留在 VRAM 中 (github.com/valine) 7 分，作者 valine，2 小时前 | 隐藏 | 过去 | 收藏 | 讨论加入我们，参加 6 月 16-17 日在旧金山举办的 AI 初创公司学校！指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们搜索：

Show HN：Torch Lens Maker – PyTorch 中的可微几何光学 2025-03-21

显示 HN：通过 TCP 连接到虚拟 GPU 2024-08-11

显示 HN：约 100 行 CUDA 中的 Flash Attention 2024-03-18

（评论） 2025-03-20

原文

This is an example of how to hotswap PyTorch training code without unloading your model weights from VRAM.

For large LLMs it can take upwards of 30 seconds to load a model from disk to VRAM. Waiting 30 seconds every time you want to rerun your script slows down development. This is a barebones implementation of a method to keep large models in VRAM even after your training script exits. If a model reload is necessary, it happens in the background after exit ensuring the model will be ready immediately the next time your script is run.

This works by spawning a second process that stays active after your target script exits. The script you change is not run directly. Instead, this background process runs the code on your behalf using Python's eval().

This can also be used over a VPN for remote code execution. IntelliJ's remote SSH interpreter is quite buggy and not ideal for seamless remote development. Configure model_server.py to run on a remote machine, and run client.py on your development machine. Debugging with the IntelliJ debugger is supported in this configuration as well, enabling an almost seamless development experience with scripts that run instantly and are easily debuggable.

Some work has also been done to ensure compatibility with the DearImgui Python bindings. UI code can be submitted to the server along with your training script. I personally like to build out UI for my training scripts to monitor progress, loss over time, and enable easy evaluation. Submitting your UI code along with your training code ensures that your app will launch instantly.

Here's a GUI from an app that displays intermediate output of Mistral 7B. It takes about 0.32 seconds on my machine from when I run the code to when I can interact with the model, and that's including initializtion time for the GUI. As an aside, you can find more transformer visualizations of mine here: https://x.com/lukasvaline

Set your model download location in model_server.py.

Compatible with IntelliJ debug server. Set your debug server port to 5678.

To begin using this in your development simply swap your .from_pretrained call and reference the global variable 'model'

This code goes away:

model = MistralForCausalLM.from_pretrained(
    self.model_path,
    torch_dtype=torch.float16,
    device_map=device,
    use_flash_attention_2=False,
    config=self.config,
)

And is replaced with:

def get_model(self):
    """Get model either from global context"""
    global model  # Reference the global model variable

    try:
        # Check if model exists in global scope
        model
    except NameError:
        return None

    return model

model = get_model()

How to run: Launch the server and keep it running

training-hot-swap$ python model_server.py

Submit the training code to the server

training-hot-swap$ python client.py ./src ./src/sample_train.py

Show HN：通过热交换代码将你的PyTorch模型保存在VRAM中 Show HN: Keep your PyTorch model in VRAM by hot swapping code

Show HN：通过热交换代码将你的PyTorch模型保存在VRAM中
Show HN: Keep your PyTorch model in VRAM by hot swapping code