展示 HN:Apache Fory Rust – 比 JSON/Protobuf 快 10-20 倍的序列化
Show HN: Apache Fory Rust – 10-20x faster serialization than JSON/Protobuf

原始链接: https://fory.apache.org/blog/2025/10/29/fory_rust_versatile_serialization_framework/

## Apache Fory Rust:极速、灵活的序列化 Apache Fory Rust 是一种新的跨语言序列化框架,旨在克服传统性能与灵活性的权衡。它提供卓越的速度——通常优于 JSON、Protocol Buffers 和其他解决方案——同时自动处理复杂场景,如循环引用、trait 对象和模式演化,*无需* IDL 文件或手动模式管理。 Fory 通过编译时代码生成、具有高效编码和引用跟踪的复杂二进制协议以及模块化架构来实现这一点。它支持广泛的类型,包括基本类型、集合、智能指针和自定义结构体,并且可以无缝地在 Rust、Java、Python 和 C++ 等语言之间工作。 主要特性包括自动共享/循环引用处理、轻松的 trait 对象序列化以及兼容的模式演化,允许独立微服务部署。Fory 在注册后是线程安全的,并提供强大的错误处理。它非常适合对速度和灵活性要求至关重要的微服务、数据管道和实时系统等高性能应用程序。 该项目采用 Apache 2.0 许可证开源,并积极寻求社区贡献。

## Apache Fory: A Fast, Cross-Language Serialization Framework Apache Fory is a new serialization framework aiming for 10-20x faster performance than JSON or Protobuf, particularly with nested objects. It achieves this through compile-time code generation, a compact binary protocol, and CPU-optimized layout. Key features include cross-language compatibility (Rust↔Python/Java/Go) *without* IDL files, support for trait objects and circular references, and schema evolution capabilities. While benchmarks show promising speed gains, some discussion centers on the fairness of comparisons to formats like FlatBuffers and the potential need for an IDL for larger, multi-language projects. The project is currently in its early stages, with JavaScript support experimental and ongoing development focused on expanding features and addressing concerns around benchmark accuracy and dependency management. It aims to bridge the gap between data-oriented programming and object-oriented approaches, offering flexibility for developers.
相关文章

原文

TL;DR: Apache Fory Rust is a blazingly-fast, cross-language serialization framework that delivers ultra-fast serialization performance while automatically handling circular references, trait objects, and schema evolution. Built with Rust's safety guarantees and zero-copy techniques, it's designed for developers who refuse to compromise between performance and developer experience.


The Serialization Dilemma

Every backend engineer has faced this moment: your application needs to serialize complex data structures such as nested objects, circular references, polymorphic types, and you're forced to choose between three bad options:

  1. Fast but fragile: Hand-rolled binary formats that break with schema changes
  2. Flexible but slow: JSON/Protocol with 10x performance overhead
  3. Complex and limiting: Existing solutions that don't support your language's advanced features

Apache Fory Rust eliminates this false choice. It's a serialization framework that delivers exceptional performance while automatically handling the complexities of modern applications—no IDL files, no manual schema management, no compromises.

What Makes Apache Fory Rust Different?

1. Truly Cross-Language

Apache Fory Rust speaks the same binary protocol as Java, Python, C++, Go, and other language implementations. Serialize data in Rust, deserialize in Python — it just works. No schema files. No code generation. No version mismatches.


let user = User {
name: "Alice".to_string(),
age: 30,
metadata: HashMap::from([("role", "admin")]),
};
let bytes = fory.serialize(&user);


user = fory.deserialize(bytes) # Just works!

This isn't just convenient — it changes how we develop microservices architectures where different teams use different languages.

2. Automatic Shared/Circular Reference Handling

Most serialization frameworks panic when encountering circular references. Apache Fory tracks and preserves reference identity automatically:

Shared Reference:

use fory::Fory;
use std::rc::Rc;

let fory = Fory::default();


let shared = Rc::new(String::from("shared_value"));


let data = vec![shared.clone(), shared.clone(), shared.clone()];


let bytes = fory.serialize(&data);
let decoded: Vec<Rc<String>> = fory.deserialize(&bytes)?;


assert_eq!(decoded.len(), 3);
assert_eq!(*decoded[0], "shared_value");


assert!(Rc::ptr_eq(&decoded[0], &decoded[1]));
assert!(Rc::ptr_eq(&decoded[1], &decoded[2]));

Circular Reference:

use fory::{ForyObject, RcWeak};

#[derive(ForyObject)]
struct Node {
value: i32,
parent: RcWeak<RefCell<Node>>,
children: Vec<Rc<RefCell<Node>>>,
}


let parent = Rc::new(RefCell::new(Node { ... }));
let child = Rc::new(RefCell::new(Node {
parent: RcWeak::from(&parent),
...
}));
parent.borrow_mut().children.push(child.clone());


let bytes = fory.serialize(&parent);
let decoded: Rc<RefCell<Node>> = fory.deserialize(&bytes)?;


assert!(Rc::ptr_eq(&decoded, &decoded.borrow().children[0].borrow().parent.upgrade().unwrap()));

This isn't just a feature—it's essential for graph databases, object-relational mappers, and domain models.

3. Trait Object Serialization

Rust's trait system enables powerful abstractions, but serializing Box<dyn Trait> is notoriously difficult. Apache Fory makes it trivial:

use fory::{ForyObject, Serializer, register_trait_type};

trait Animal: Serializer {
fn speak(&self) -> String;
}

#[derive(ForyObject)]
struct Dog { name: String, breed: String }

#[derive(ForyObject)]
struct Cat { name: String, color: String }


register_trait_type!(Animal, Dog, Cat);


let animals: Vec<Box<dyn Animal>> = vec![
Box::new(Dog { ... }),
Box::new(Cat { ... }),
];

let bytes = fory.serialize(&animals);
let decoded: Vec<Box<dyn Animal>> = fory.deserialize(&bytes)?;


decoded[0].speak();
decoded[1].speak();

Alternative: Using dyn Any without trait registration:

use std::rc::Rc;
use std::any::Any;


let dog: Rc<dyn Any> = Rc::new(Dog { name: "Rex".to_string(), breed: "Labrador".to_string() });
let cat: Rc<dyn Any> = Rc::new(Cat { name: "Whiskers".to_string(), color: "Orange".to_string() });

let bytes = fory.serialize(&dog);
let decoded: Rc<dyn Any> = fory.deserialize(&bytes)?;


let unwrapped = decoded.downcast_ref::<Dog>().unwrap();
assert_eq!(unwrapped.name, "Rex");

Supports:

  • Box<dyn Trait> - Owned trait objects
  • Rc<dyn Trait> / Arc<dyn Trait> - Reference-counted trait objects
  • Rc<dyn Any> / Arc<dyn Any> - Runtime type dispatch without traits
  • Auto-generated wrapper types for standalone serialization

This unlocks plugin systems, heterogeneous collections, and extensible architectures that were previously impossible to serialize.

4. Schema Evolution Without Breaking Changes

Microservices evolve independently. Apache Fory's Compatible mode allows schema changes without coordination:

use fory::{Fory, ForyObject};


#[derive(ForyObject)]
struct User {
name: String,
age: i32,
address: String,
}

let mut fory_v1 = Fory::default().compatible(true);
fory_v1.register::<User>(1);


#[derive(ForyObject)]
struct User {
name: String,
age: i32,

phone: Option<String>,
metadata: HashMap<String, String>,
}

let mut fory_v2 = Fory::default().compatible(true);
fory_v2.register::<User>(1);


let v1_bytes = fory_v1.serialize(&user_v1);
let user_v2: User = fory_v2.deserialize(&v1_bytes)?;

Compatibility rules:

  • ✅ Add new fields (default values applied)
  • ✅ Remove fields (skipped during deserialization)
  • ✅ Reorder fields (matched by name)
  • ✅ Change nullability (TOption<T>)
  • ❌ Type changes (except nullable variants)

This is critical for zero-downtime deployments and polyglot microservices.

The Technical Foundation

Protocol Design

Apache Fory uses a sophisticated binary protocol designed for both performance and flexibility:

| fory header | reference meta | type meta | value data |

Key innovations:

  1. Efficient encoding: Variable-length integers, compact type IDs, bit-packed flags
  2. Reference tracking: Deduplicates shared objects automatically (serialize once, reference thereafter)
  3. Meta compression: Gzip compression for type metadata in meta-sharing mode
  4. Little-endian layout: Optimized for modern CPU architectures

Compile-Time Code Generation

Unlike reflection-based frameworks, Apache Fory generates serialization code at compile time via procedural macros:

use fory::ForyObject;

#[derive(ForyObject)]
struct Person {
name: String,
age: i32,
address: Address,
}






Benefits:

  • Zero runtime overhead: No reflection, no vtable lookups
  • 🛡️ Type safety: Compile-time errors instead of runtime panics
  • 📦 Small binary size: Only code for types you actually use
  • 🔍 IDE support: Full autocomplete and error checking

Architecture

Apache Fory Rust consists of three focused crates:

fory/            # High-level API
└─ Convenience wrappers, derive re-exports

fory-core/ # Core serialization engine
├─ fory.rs # Main entry point
├─ buffer.rs # Zero-copy binary I/O
├─ serializer/ # Type-specific serializers
├─ resolver/ # Type registration & dispatch
├─ meta/ # Meta string compression
└─ row/ # Row format implementation

fory-derive/ # Procedural macros
├─ object/ # ForyObject derive macro
└─ fory_row.rs # ForyRow derive macro

This modular design ensures clean separation of concerns and makes the codebase maintainable.

Benchmarks: Real-World Performance

DatatypeSizeOperationFory TPSJSON TPSProtobuf TPSFastest
companysmallserialize10,063,906761,673896,620fory
companymediumserialize412,50733,83537,590fory
companylargeserialize9,183793880fory
ecommerce_datasmallserialize2,350,729206,262256,970fory
ecommerce_datamediumserialize59,9774,6995,242fory
ecommerce_datalargeserialize3,727266295fory
personsmallserialize13,632,5221,345,1891,475,035fory
personmediumserialize3,839,656337,610369,031fory
personlargeserialize907,85379,63191,408fory
simple_listsmallserialize27,726,9454,874,9574,643,172fory
simple_listmediumserialize4,770,765401,558397,551fory
simple_listlargeserialize606,06141,06144,565fory
simple_mapsmallserialize22,862,3693,888,0252,695,999fory
simple_mapmediumserialize2,128,973204,319193,132fory
simple_maplargeserialize177,84718,41918,668fory
simple_structsmallserialize35,729,59810,167,0458,633,342fory
simple_structmediumserialize34,988,2799,737,0986,433,350fory
simple_structlargeserialize31,801,5584,545,0417,420,049fory
system_datasmallserialize5,382,131468,033569,930fory
system_datamediumserialize174,24011,89614,753fory
system_datalargeserialize10,6718761,040fory

When to Use Apache Fory Rust

Ideal Use Cases

  1. Microservices with polyglot teams

    • Different services in different languages
    • Need seamless data exchange without schema files
    • Schema evolution across independent deployments
  2. High-performance data pipelines

    • Processing millions of records per second
    • Memory-constrained environments (use row format)
    • Analytics workloads with selective field access
  3. Complex domain models

    • Circular references (parent-child relationships, graphs)
    • Polymorphic types (trait objects, inheritance hierarchies)
    • Rich object graphs with shared references
  4. Real-time systems

    • Low-latency requirements (<1ms serialization)
    • Memory-mapped file access
    • Zero-copy deserialization critical

⚠️ Consider Alternatives If

  1. You need human-readable data: Use JSON/YAML for debugging
  2. You need long-term storage format: Use Parquet for data lakes
  3. Your data is trivial: serde + bincode is simpler for basic types

Getting Started in 5 Minutes

Installation

Add to Cargo.toml:

[dependencies]
fory = "0.13"

Basic Object Serialization

use fory::{Fory, Error, ForyObject};

#[derive(ForyObject, Debug, PartialEq)]
struct User {
name: String,
age: i32,
email: String,
}

fn main() -> Result<(), Error> {
let mut fory = Fory::default();
fory.register::<User>(1);
let user = User {
name: "Alice".to_string(),
age: 30,
email: "[email protected]".to_string(),
};

let bytes = fory.serialize(&user);

let decoded: User = fory.deserialize(&bytes)?;
assert_eq!(user, decoded);
Ok(())
}

Cross-Language Serialization

use fory::Fory;


let mut fory = Fory::default().compatible(true).xlang(true);


fory.register_by_namespace::<User>(1);


let bytes = fory.serialize(&user);

Register types with consistent IDs or names across all languages:

  • By ID (fory.register::<User>(1)): Faster serialization, more compact encoding, but requires coordination to avoid ID conflicts
  • By name (fory.register_by_name::<User>("example.User")): More flexible, less prone to conflicts, easier to manage across teams, but slightly larger encoding

Supported Types

Apache Fory Rust supports a comprehensive type system:

Primitives: bool, i8, i16, i32, i64, f32, f64, String

Collections: Vec<T>, HashMap<K,V>, BTreeMap<K,V>, HashSet<T>, Option<T>

Smart Pointers: Box<T>, Rc<T>, Arc<T>, RcWeak<T>, ArcWeak<T>, RefCell<T>, Mutex<T>

Date/Time: chrono::NaiveDate, chrono::NaiveDateTime

Custom Types: Derive ForyObject for object graphs, ForyRow for row format

Trait Objects: Box<dyn T>, Rc<dyn T>, Arc<dyn T>, Rc<dyn Any>, Arc<dyn Any>

Roadmap: What's Next

Apache Fory Rust is production-ready today, but we're just getting started and continuing active development:

Shipped in v0.13

  • ✅ Static codegen via procedural macros
  • ✅ Row format serialization with zero-copy
  • ✅ Cross-language object graph serialization
  • ✅ Shared and circular reference tracking
  • ✅ Weak pointer support (RcWeak, ArcWeak)
  • ✅ Trait object serialization (Box/Rc/Arc)
  • ✅ Schema evolution in compatible mode

🚧 Coming Soon

🎯 Help Wanted

We're actively seeking contributors for:

  • Performance tuning: Profile and optimize hot paths
  • Documentation: More examples, tutorials, and guides
  • Testing: Fuzzing, property tests, edge case coverage

Production Considerations

Thread Safety

Fory becomes fully thread-safe after registration is complete. Once every type is registered (which requires &mut Fory), wrap the instance in an Arc and freely share it across worker threads for concurrent serialization and deserialization.

use fory::Fory;
use std::{sync::Arc, thread};

let mut fory = Fory::default();
fory.register::<Item>(1)?;
let fory = Arc::new(fory);

let item = Item::default();
let handles: Vec<_> = (0..4)
.map(|_| {
let fory = Arc::clone(&fory);
let input = item.clone();
thread::spawn(move || {
let bytes = fory.serialize(&input);
let decoded: Item = fory.deserialize(&bytes).expect("valid data");
(bytes, decoded)
})
})
.collect();

for handle in handles {
let (bytes, decoded) = handle.join().expect("thread finished");

}

Error Handling

Apache Fory uses Result<T, Error> for all fallible operations:

use fory::Error;

match fory.deserialize::<User>(&bytes) {
Ok(user) => process_user(user),
Err(Error::TypeMismatch) => log::error!("Schema mismatch"),
Err(Error::BufferTooShort) => log::error!("Incomplete data"),
Err(e) => log::error!("Deserialization failed: {}", e),
}

Community and Contribution

Apache Fory is an Apache Software Foundation project with a vibrant, growing community:

How to Contribute

We welcome contributions of all kinds:

  1. Code: Implement features from the roadmap
  2. Docs: Write tutorials, examples, and guides
  3. Testing: Add benchmarks, fuzz tests, integration tests
  4. Feedback: Report bugs, request features, share use cases

See CONTRIBUTING.md for guidelines.

License

Apache Fory is licensed under the Apache License 2.0, a permissive open-source license that allows commercial use, modification, and distribution.

Conclusion

Apache Fory Rust represents a paradigm shift in serialization:

  • No more trade-offs: Get performance and flexibility
  • No more boilerplate: Derive macros handle the complexity
  • No more lock-in: Trait-object and shared reference support by nature

Whether you're building microservices, data pipelines, or real-time systems, Apache Fory Rust delivers the performance you need with the ergonomics you deserve.

Try it today:

Join the community:

git clone https://github.com/apache/fory.git
cd fory/rust
cargo test --features tests

Share your experience:

  • Write a blog post about your use case
  • Present at your local Rust meetup
  • Contribute benchmarks from your domain
联系我们 contact @ memedata.com