大理石:一个多模态世界模型
Marble: A Multimodal World Model

原始链接: https://www.worldlabs.ai/blog/marble-world-model

## 大理石:生成式世界模型现已发布 大理石,一款尖端的人工智能世界模型,现已全面发布,它能赋予用户从简单的文本、图像、视频甚至粗略的3D布局中创建和互动于细节丰富的3D世界的能力。这种“多模态”方法允许进行广泛的创意控制——世界可以被完全生成、交互式编辑、扩展或由多个来源组合而成。 主要功能包括**Chisel**,一个用于精确3D雕刻的工具,以及扩展现有世界以创建更大、更详细环境的能力。导出选项包括高斯飞溅、网格和带有动态元素的增强视频。 **Marble Labs** 作为一个创意中心,展示了游戏、VFX、设计和机器人等领域的创新应用,并为所有级别的用户提供教程和文档。 此次发布代表着“空间智能”方面的一个重要步骤——即理解和模拟我们周围世界的AI。虽然目前专注于创建和编辑,但未来的开发目标是实现对这些生成世界内的无缝交互,以应用于模拟和机器人等领域。 立即在 [marble.worldlabs.ai](http://marble.worldlabs.ai) 开始构建您自己的3D世界!

## 大理石:一种新的多模态世界模型 - 摘要 WorldLabs的“大理石”是一个新的AI系统,能够从文本或图像提示生成3D场景,利用高斯飞溅技术。Hacker News社区的初步反应不一。许多人对它的视觉质量和潜力印象深刻,尤其是在游戏开发和电影制作等应用方面,但也有人对其局限性表示担忧。 讨论的关键点包括:超出初始图像视角的质量下降,图像输入限制(与文本一起四个,不与文本一起八个),以及它是否真正代表“世界模型”,还是仅仅是先进的3D场景生成。一些用户指出它类似于Skybox AI等现有技术,考虑到其巨额资金,质疑其新颖性。 它与其他项目(如DeepMind的Genie)进行了比较,突出了方法上的差异——大理石专注于静态资产创建,而Genie旨在实现实时视频生成。对于“世界模型”的定义本身也存在争论,一些人认为这是一个营销术语,而另一些人则认为它与传统的AI/机器人学定义不符。尽管存在批评,许多人认为大理石有可能加速内容创作流程,尤其是在视觉媒体方面。
相关文章

原文

Spatial intelligence is the next frontier in AI, demanding powerful world models to realize its full potential. World models should reconstruct, generate, and simulate 3D worlds; and allow both humans and agents to interact with them. Spatially intelligent world models will transform a wide variety of industries over the coming years.

Two months ago we shared a preview of Marble, our World Model that creates 3D worlds from image or text prompts. Since then, Marble has been available to an early set of beta users to create 3D worlds for themselves.

Today we are making Marble, a first-in-class generative multimodal world model, generally available for anyone to use. We have also drastically expanded Marble's capabilities, and are excited to highlight them here:

Multimodal Marble: Marble is now massively multimodal. Marble can create 3D worlds from text, images, video, or coarse 3D layouts; Marble also lets you interactively edit, expand, and combine worlds. Once generated, 3D worlds can be exported as Gaussian splats, meshes, or videos. These new capabilities let users create and edit worlds with fine-grained control; and makes those worlds more useful than ever before.

Marble Labs: We are launching Marble Labs, a creative hub where imagination meets experimentation. It is where artists, engineers, and designers push the boundaries of world models, showcasing bold ideas, real-world workflows, and new possibilities across gaming, VFX, design, robotics, and beyond. Marble Labs is also home to in-depth case studies, tutorials, and documentation that give anyone the tools to learn, build, and share their own 3D worlds.

Sign up at marble.worldlabs.ai and start creating worlds for yourself!

The Marble World Model

Our human experience of the world is inherently multimodal: we use all of our senses to make sense of the world around us. We integrate sight, sound, touch, and language to build up a mental model of the outside world; these different representations work together, enriching and reinforcing each other to let us reason about the world and act within it.

World models should work similarly. They should be massively multimodal, able to lift whatever input signals are available into a full 3D world, and they should be able to iteratively update their understanding of the world as new information becomes available.

Marble is the first of its kind - a next-generation world model making strides toward this vision. It can now create 3D worlds from a wide variety of input types, and lets users iteratively edit or expand worlds.

Marble's new capabilities let you dive as deep as you want in controlling your generated worlds. You can quickly create full 3D worlds from a simple image or text prompt or interactively edit worlds in both 2D and 3D, bringing to life a precise vision of a world in your mind.

Text and Image to World

To start, Marble can create a full 3D world from a single image or a short text prompt. This is the simplest and easiest way to create worlds. Marble can generate worlds with a wide variety of scene types and artistic styles.

Image prompts make it easy to combine Marble with other AI tools. You can generate images with your favorite image generation model, then bring it to Marble to lift it to a full 3D world.

Text Prompt

A detailed, lived-in hobbit kitchen filled with woven baskets and copper kettles, awash in calm pale-blue daylight and soft ambient shadow

Generated World

Generated World

Text Prompt

A station kitchen blending mid-century diner aesthetics with orbital tech, featuring checkered floors and stainless fixtures under soft aqua illumination.

Generated World

Generated World

Text Prompt

A sunlit stone castle courtyard with ivy-covered walls, bathed in warm golden morning light and soft drifting shadows.

Generated World

Generated World

Text Prompt

A whimsical anime library painted in pastel pinks and mint greens, with cloud-shaped pillows in cozy cubby reading corners.

Generated World

Text-to-world generation

Text-to-world generation

Text and image prompts are intuitive and powerful, but limited in creative control: Marble must invent all the details of the world that are not present in the input text or image prompt. This is often magical; but sometimes you may want to steer Marble more directly toward a desired world.

Multi-Image and Video to World

An easy way to create worlds with more creative control is multi-image prompting. Marble can accept different prompt images for different parts of the world, stitching them together into a unified 3D world.

Multi-image prompts let you create worlds with more precision. Unlike text or single-image prompts where Marble must invent all parts of the world not present in the prompt, with multi-image prompts you can control what the generated world will look like from different angles.

This leads to a brand-new workflow for generating worlds. You can use your favorite image generation tool to iterate separately on the input views, and Marble will lift them into full 3D worlds while also adding seamless transitions between the input views.

Generated World

Generated World

Generated World

Generated World

Multiple Images to World

Multiple Images to World

Multi-image prompts can also be used to create worlds inspired by real-world spaces. Marble can input a few photos or a short video depicting a real-world location from different angles, and it will combine them to generate a 3D world with elements of the real-world space.

Generated World

Generated World

Multi-image prompts can generate worlds from real-world photos

Multi-image prompts can generate worlds from real-world photos

World Editing

The creative process is highly iterative for many users. Often, generating a world is only the start of a creative journey. Seeing a generated 3D world often kicks off a dozen more ideas for changing it or improving it.

Marble includes AI-native world editing tools. Edits can be small and local: remove an object, touch up an area. They can also be more drastic: swap objects, change the visual style, or re-structure large parts of the world. This gives a new level of fine-grained control to the world creation process.

Edit: Turn the entire back wall into a stage, and replace the tables with low benches facing the stage

Edit: Turn the entire back wall into a stage, and replace the tables with low benches facing the stage

World editing lets you re-imagine the same space in endless different ways.

Edit: Change all the kitchen counters to black granite

Edit: Change all the kitchen counters to black granite

Chisel: Sculpting Worlds in 3D

Marble's multimodal inputs and editing features give a lot of control over your generated 3D worlds. But sometimes, creating the world exactly as you see it in your mind's eye requires finer-grained control over the scene layout or exact sizes and positions of objects.

For these situations we are introducing Chisel, an AI-native tool to sculpt Marble worlds directly in 3D.

Chisel is a new experimental editing mode for advanced users to create 3D worlds. It lets you lay out the coarse structure of your world in 3D using coarse 3D shapes like boxes or planes, or importing existing 3D assets into the scene.

After laying out the coarse 3D scene, you can add a text prompt to describe the visual style of the scene, or additional elements not present in the coarse layout. Marble will combine these inputs to give you a fully detailed 3D world.

Chisel decouples structure from style. The coarse 3D scene determines the world's structure, while the text prompt controls its overall style. The two can be mixed in any combination, adding a whole new dimension of control to world generation.

Text prompt: A beautiful modern art museum with wooden flooring, filled with colorful paintings and curving sculptures

Text prompt: A beautiful modern art museum with wooden flooring, filled with colorful paintings and curving sculptures

The coarse 3D scene can be as simple or complex as you want. In addition to building the coarse 3D scene out of basic blocks and walls, you can import existing 3D assets of objects. Objects will be restyled based on the text prompt to give a cohesive 3D world.

Varying the text prompt can give rise to 3D worlds with drastically different visual styles and appearances that all share a common structure determined by the coarse 3D scene.

Text Prompt: A serene scandinavian guesthouse bedroom with stunning views of glaciers

Text Prompt: A serene scandinavian guesthouse bedroom with stunning views of glaciers

Building Large Worlds by Expanding and Composing

Sometimes bigger really is better. Larger worlds give more possibilities, more space, more room for your creativity to shine. Marble offers two ways to make bigger worlds than ever before.

After a world has been generated, Marble allows one-step expansion to make it larger. You are in control of this process: you can select a region of the world to be expanded, and Marble will create more content to fill the selected region.

Expansion can make worlds larger. Regions of the world that previously broke down into artifacts can become crisp and clean after expansion. Expansion can also be used to add detail to targeted regions of a world. Sometimes the back of a table or the far corner of a room is not a crisp as the room's center; expanding the world in that region can improve it.

Marble can expand scenes to create larger traversable areas

Marble can expand scenes to create larger traversable areas

In addition to generating individual worlds, you can compose any number of worlds to build out extremely large spaces with Marble's composer mode. This composition is entirely under your control: you can choose exactly which worlds to compose, and exactly how to lay them out relative to each other. Composing is yet another way to build worlds that follow your creative vision.

A large train composed with Marble

A large train composed with Marble

Exporting Worlds to 3D and Video

After creating a world with Marble you have many options to export it for incorporation into downstream projects.

Gaussian splats are the highest-fidelity representation for Marble worlds. They represent 3D scenes as a large set of semitransparent particles. You can render Gaussian splats in the browser using Spark, our open-source cross-platform renderer integrated with THREE.js.

Marble worlds can also be exported as triangle meshes. Marble can generate both collider meshes, which are low-fidelity meshes intended for coarse physics simulation; and high-quality meshes which are intended to match the visual fidelity of Gaussian splats as closely as possible. Exporting worlds as meshes lets them interoperate with many industry-standard tools.

Marble can export generated worlds as Gaussian splats or triangle meshes

Marble can export generated worlds as Gaussian splats or triangle meshes

Marble worlds exist in full 3D, but sometimes a video is the best way to share a world. You can use Marble to render generated worlds to videos with pixel-accurate camera control, letting you frame every shot just as you imagine it. In fact, nearly all the videos in this post were generated directly from Marble.

Marble can also enhance exported videos. Enhanced videos can add detail, remove artifacts, and add dynamic elements to the scene, while maintaining pixel-perfect camera control and adhering to the structure of the generated 3D world.

Enhanced videos clean up artifacts and introduce motion in the scene. Notice the smoke above the chimney, the dancing flames, and the flowing water.

Enhanced videos clean up artifacts and introduce motion in the scene. Notice the smoke above the chimney, the dancing flames, and the flowing water.

Marble Labs: A Glimpse of Future Possibilities

While flexing your creativity in Marble, Marble Labs may further inspire your imagination. This is where artists, engineers, and designers are already shaping what comes next. From cinematic filmmaking and interactive worlds to robotics simulations and therapeutic environments, these projects show how Marble is transforming imagination into reality. Each one reflects a new way of building with world models, both creative and technical. Explore Marble Labs to see what others are creating and discover how you can start building your own worlds today.

From Marble to Spatial Intelligence

Marble is a state-of-the-art generative world model. Today it lets you create worlds from diverse input types, edit them, expand them, and export them. These capabilities give you unprecedented levels of control when creating worlds, and are already enabling a wide variety of creative use cases across industries.

But Marble is just a step on our journey toward spatial intelligence. Going forward, a key opportunity is interactivity. Future world models will let humans and agents alike interact with generated worlds in new ways, unlocking even more use cases in simulation, robotics, and beyond.

Try Marble Today

Marble is available today at marble.worldlabs.ai. Sign up now and start creating worlds!

If you are excited about this vision and want to help us build it, join us!

联系我们 contact @ memedata.com