Gt:[实验性] 多路复用张量框架
GT – Experimental multiplexing tensor framework for distributed GPU computing

原始链接: https://github.com/bwasti/gt

## GT:一种用于分布式GPU计算的动态异步框架 GT是一个新颖的Python框架,旨在简化机器学习的分布式GPU计算,摆脱传统的锁步方法。它借鉴了多核操作系统的思想,采用动态调度和异步执行,提供类似于PyTorch的熟悉、即时模式API。 该系统包括客户端(用户)、调度器和工作进程(每个GPU一个)。客户端定义张量操作,调度器将其转换为GPU执行指令,并根据YAML配置和客户端的“信号”分片数据,工作进程异步处理指令,并可选择进行JIT编译。 主要特性包括通过ZeroMQ实现的高性能传输、基于磁带的自动微分、PyTorch兼容性,以及用于实时监控、指令记录和AI辅助开发的工具。GT优先考虑可读性和可调试性,旨在借助AI编码助手轻松理解和修改。它是一个研究原型,注重简洁性,并欢迎贡献。

Hacker News新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 GT – 用于分布式GPU计算的实验性多路复用张量框架 (github.com/bwasti) 30点 由 brrrrrm 1天前 | 隐藏 | 过去 | 收藏 | 1评论 almostgotcaught 22小时前 [–] Bram总是能构建出非常优雅的东西。回复 考虑申请YC冬季2026批次!申请截止至11月10日 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

An experimental multiplexing tensor framework for distributed GPU computing.

gt_viz

pip install git+https://github.com/bwasti/gt.git
python -c 'import gt; print(gt.randn(2,2))'

The motivation for this project is a rejection of the clunky lock-step paradigm ML researchers tend to use. GT attempts to pull some of the ideas that are present in the decades of development done on multi-core operating systems. It fully embraces dynamic scheduling and heavily asynchronous execution while presenting a familiar eager frontend.

  • Three components
    • N × clients (as many users as you want!)
    • 1 × dispatcher (for coordinating)
    • N × workers (1 per GPU)
  • Everything communicates with a stream of instructions
    • Clients deal with math. They emit (GPU-unaware) pure functional instructions
    • The dispatcher rewrites these instructions on the fly to be GPU-aware and sends them to the workers
    • Workers asynchronously process these instructions, optionally JIT compiling
  • Instruction streams are annotated
    • Clients can send "signals" which allow the dispatcher to more appropriately shard the tensors
    • Dispatchers annotate "hot" paths to give hints to workers about JIT compiling
    • Annotations are supplemented with YAML configs that specify sharding and compilation information
    • Every annotation can be safely ignored, so the same code can run anywhere (just remove the YAML)
import gt

a = gt.randn(1000, 1000)
b = gt.randn(1000, 1000)
c = a @ b
result = c[:4, :4]
print(result)

It may not look like it, but in the background GT automatically spins up an asynchronous dispatching server and GPU worker.

  • High-performance transport - ZeroMQ (ZMQ) with automatic message batching and efficient DEALER/ROUTER pattern
  • Autograd support - Tape-based automatic differentiation exclusively at the client layer
  • PyTorch-compatible API - Familiar syntax for tensor operations
  • Signal-based sharding - Declarative YAML configuration for distributed training
  • Real-time monitoring - htop-style visualization of worker activity
  • Instruction logging - Debug distributed execution with timeline visualizations
  • AI-assisted development - Optimized for collaboration with AI coding assistants

📚 Read the full documentation

See examples/ directory for demonstrations:

  • demo.py - Basic tensor operations
  • signal_demo.py - Signal-based sharding
  • compile_demo.py - Compilation directives
  • debug_demo.py - Debug utilities
  • visualize_demo.py - Instruction tape visualization
┌─────────────────────────────────────────────────────────────────┐
│                          User Code                              │
│  import gt                                                      │
│  with gt.signal.context('layer1'):                              │
│      x = gt.randn(100, 64)                                      │
│      loss = model(x)                                            │
│      loss.backward()                                            │
└──────────────────────┬──────────────────────────────────────────┘
                       │ PyTorch-like API + Signal Metadata
                       │
┌──────────────────────▼──────────────────────────────────────────┐
│                      gt/client/                                 │
│  ┌──────────────┐  ┌─────────────┐  ┌──────────────┐            │
│  │   Tensor     │  │  Autograd   │  │  nn.Module   │            │
│  │ (Remote Data)│  │   (Tape)    │  │  (Layers)    │            │
│  └──────────────┘  └─────────────┘  └──────────────┘            │
└──────────────────────┬──────────────────────────────────────────┘
                       │ ZMQ (DEALER → ROUTER)
                       │
┌──────────────────────▼──────────────────────────────────────────┐
│                    gt/dispatcher/                               │
│  • ZMQ ROUTER socket handles all connections                    │
│  • Reads signal configs from YAML                               │
│  • Routes operations based on sharding strategy                 │
│  • Logs instruction stream to file                              │
│  • Handles multiple clients concurrently                        │
└───────┬──────────────┬──────────────┬───────────────────────────┘
        │              │              │ ZMQ (DEALER ← ROUTER)
        │              │              │
    ┌───▼────┐    ┌───▼────┐    ┌───▼────┐
    │Worker 0│    │Worker 1│    │Worker N│ (1 per GPU)
    │PyTorch │    │PyTorch │    │PyTorch │
    │  GPU   │    │  GPU   │    │  GPU   │
    └────────┘    └────────┘    └────────┘

Optimized for AI Development

GT is designed to be understood, modified, and debugged with AI coding assistants:

  • CLAUDE.md - Detailed architecture documentation for AI assistants
  • Declarative YAML configs - Easy for AI to parse and generate
  • Tape-based debugging - Inspect computation graphs with gt.debug.print_tape()
  • Instruction logging - Track every operation with timestamps
  • Comprehensive test suite - 50+ tests serving as executable specifications

Contributions welcome! This is a research prototype focused on simplicity and readability.

See Contributing Guide for development workflow, testing, code style, and PR guidelines.

MIT

See License for details.

联系我们 contact @ memedata.com