An experimental multiplexing tensor framework for distributed GPU computing.
pip install git+https://github.com/bwasti/gt.git
python -c 'import gt; print(gt.randn(2,2))'The motivation for this project is a rejection of the clunky lock-step paradigm ML researchers tend to use. GT attempts to pull some of the ideas that are present in the decades of development done on multi-core operating systems. It fully embraces dynamic scheduling and heavily asynchronous execution while presenting a familiar eager frontend.
- Three components
- N × clients (as many users as you want!)
- 1 × dispatcher (for coordinating)
- N × workers (1 per GPU)
- Everything communicates with a stream of instructions
- Clients deal with math. They emit (GPU-unaware) pure functional instructions
- The dispatcher rewrites these instructions on the fly to be GPU-aware and sends them to the workers
- Workers asynchronously process these instructions, optionally JIT compiling
- Instruction streams are annotated
- Clients can send "signals" which allow the dispatcher to more appropriately shard the tensors
- Dispatchers annotate "hot" paths to give hints to workers about JIT compiling
- Annotations are supplemented with YAML configs that specify sharding and compilation information
- Every annotation can be safely ignored, so the same code can run anywhere (just remove the YAML)
import gt
a = gt.randn(1000, 1000)
b = gt.randn(1000, 1000)
c = a @ b
result = c[:4, :4]
print(result)It may not look like it, but in the background GT automatically spins up an asynchronous dispatching server and GPU worker.
- High-performance transport - ZeroMQ (ZMQ) with automatic message batching and efficient DEALER/ROUTER pattern
- Autograd support - Tape-based automatic differentiation exclusively at the client layer
- PyTorch-compatible API - Familiar syntax for tensor operations
- Signal-based sharding - Declarative YAML configuration for distributed training
- Real-time monitoring - htop-style visualization of worker activity
- Instruction logging - Debug distributed execution with timeline visualizations
- AI-assisted development - Optimized for collaboration with AI coding assistants
See examples/ directory for demonstrations:
demo.py- Basic tensor operationssignal_demo.py- Signal-based shardingcompile_demo.py- Compilation directivesdebug_demo.py- Debug utilitiesvisualize_demo.py- Instruction tape visualization
┌─────────────────────────────────────────────────────────────────┐
│ User Code │
│ import gt │
│ with gt.signal.context('layer1'): │
│ x = gt.randn(100, 64) │
│ loss = model(x) │
│ loss.backward() │
└──────────────────────┬──────────────────────────────────────────┘
│ PyTorch-like API + Signal Metadata
│
┌──────────────────────▼──────────────────────────────────────────┐
│ gt/client/ │
│ ┌──────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ Tensor │ │ Autograd │ │ nn.Module │ │
│ │ (Remote Data)│ │ (Tape) │ │ (Layers) │ │
│ └──────────────┘ └─────────────┘ └──────────────┘ │
└──────────────────────┬──────────────────────────────────────────┘
│ ZMQ (DEALER → ROUTER)
│
┌──────────────────────▼──────────────────────────────────────────┐
│ gt/dispatcher/ │
│ • ZMQ ROUTER socket handles all connections │
│ • Reads signal configs from YAML │
│ • Routes operations based on sharding strategy │
│ • Logs instruction stream to file │
│ • Handles multiple clients concurrently │
└───────┬──────────────┬──────────────┬───────────────────────────┘
│ │ │ ZMQ (DEALER ← ROUTER)
│ │ │
┌───▼────┐ ┌───▼────┐ ┌───▼────┐
│Worker 0│ │Worker 1│ │Worker N│ (1 per GPU)
│PyTorch │ │PyTorch │ │PyTorch │
│ GPU │ │ GPU │ │ GPU │
└────────┘ └────────┘ └────────┘
GT is designed to be understood, modified, and debugged with AI coding assistants:
- CLAUDE.md - Detailed architecture documentation for AI assistants
- Declarative YAML configs - Easy for AI to parse and generate
- Tape-based debugging - Inspect computation graphs with
gt.debug.print_tape() - Instruction logging - Track every operation with timestamps
- Comprehensive test suite - 50+ tests serving as executable specifications
Contributions welcome! This is a research prototype focused on simplicity and readability.
See Contributing Guide for development workflow, testing, code style, and PR guidelines.
MIT
See License for details.
