SMERF：可流式内存高效辐射场

SMERF：可流式内存高效辐射场
SMERF: Streamable Memory Efficient Radiance Fields

在“用于实时大场景探索的流式内存高效辐射场”或“SMERF”中，研究人员介绍了一种通过实时视图合成处理大型虚拟环境的创新方法。这项新技术平衡了显式场景表示与基于光线行进构建的神经场的能力。可视化此类环境的传统方法受到过高计算成本的限制，但 SMERF 方法比这些传统选项提供了显着改进。通过实施分层模型分区方案以在不影响性能的情况下增加模型容量，并使用蒸馏训练方法提供更高保真度的图像，他们在尺寸高达 300 平方米的大型场景上实现了最先进的精度，同时以最小的延迟实现完全身临其境的 6 自由度交互。他们的结果显示，与之前的方法相比，性能提高了 1.78 dB，生成图像所需的计算时间减少了 18 倍。最终，SMERF 提供了一个实用的解决方案，即使在智能手机和笔记本电脑等商用硬件上，也能几乎没有延迟地产生视觉丰富的交互式体验。

然而，必须承认，自我推销和局限性讨论在学术界很常见，尤其是在计算机视觉和机器学习等领域。虽然自我推销的语言或过度强调手稿中的成就有时会模糊客观性和主观性之间的界限，但科学的完整性和严格的评估最终确保了进一步研究的坚实基础和可靠的结果。因此，尽管看起来有些自私，但在整个出版过程中保持透明度和问责制对于保持公平、准确性和公正性仍然至关重要，而这是科学界的基本原则。

Google DeepMind¹
Google Research²
Google Inc.³
Tübingen AI Center, University of Tübingen⁴

Abstract

Recent techniques for real-time view synthesis have rapidly advanced in fidelity and speed, and modern methods are capable of rendering near-photorealistic scenes at interactive frame rates. At the same time, a tension has arisen between explicit scene representations amenable to rasterization and neural fields built on ray marching, with state-of-the-art instances of the latter surpassing the former in quality while being prohibitively expensive for real-time applications. In this work, we introduce SMERF, a view synthesis approach that achieves state-of-the-art accuracy among real-time methods on large scenes with footprints up to 300 m^2 at a volumetric resolution of 3.5 mm^3. Our method is built upon two primary contributions: a hierarchical model partitioning scheme, which increases model capacity while constraining compute and memory consumption, and a distillation training strategy that simultaneously yields high fidelity and internal consistency. Our approach enables full six degrees of freedom (6DOF) navigation within a web browser and renders in real-time on commodity smartphones and laptops. Extensive experiments show that our method exceeds the current state-of-the-art in real-time novel view synthesis by 0.78 dB on standard benchmarks and 1.78 dB on large scenes, renders frames three orders of magnitude faster than state-of-the-art radiance field models, and achieves real-time performance across a wide variety of commodity devices, including smartphones.

Real-Time Interactive Viewer Demos

How we boost representation power to handle large scenes

(a): We model large multi-room scenes with a number of independent submodels, each of which is assigned to a different region of the scene. During rendering the submodel is picked based on camera origin. (b): To model complex view-dependent effects, within each submodel we additionally instantiate grid-aligned copies of deferred MLP parameters \(\theta\). These parameters are trilinearly interpolated based on camera origin \(\mathbf{o}\). (c): While each submodel represents the entire scene, only the submodel's assiociated grid cell is modelled with high resolution, which is realized by contracting the submodel-specific local coordinates.

Getting the maximum out of our representation via distillation

We demonstrate that image fidelity can be greatly boosted via distillation. We first train a state-of-the-art offline radiance field (Zip-NeRF). We then use the RGB color predictions \(\mathbf{c}\) of this teacher model as supervision for our own model. Additionally, we access the volumetric density values \(\tau\) of the pre-trained teacher by minimizing the discrepancy of volume rendering weights between teacher and student.

Citation

If you want to cite our work, please use:

@misc{duckworth2023smerf,
      title={SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration}, 
      author={Daniel Duckworth and Peter Hedman and Christian Reiser and Peter Zhizhin and Jean-François Thibert and Mario Lučić and Richard Szeliski and Jonathan T. Barron},
      year={2023},
      eprint={2312.07541},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

The website template was borrowed from Michaël Gharbi. Image sliders are based on dics.