Text-Guided Interactive Scene Synthesis with Scene Prior Guidance

Text-Guided Interactive Scene Synthesis with Scene Prior Guidance (2025)

Shaoheng Fang, Haitao Yang, Raymond Mooney, Qixing Huang

3D scene synthesis using natural language instructions has become a popular direction in computer graphics, with significant progress made by data-driven generative models recently. However, previous methods have mainly focused on one-time scene generation, lacking the interactive capability to generate, update, or correct scenes according to user instructions. To overcome this limitation, this paper focuses on text-guided interactive scene synthesis. First, we introduce the SceneMod dataset, which comprises 168k paired scenes with textual descriptions of the modifications. To support the interactive scene synthesis task, we propose a two-stage diffusion generative model that integrates scene-prior guidance into the denoising process to explicitly enforce physical constraints and foster more realistic scenes. Experimental results demonstrate that our approach outperforms baseline methods in text-guided scene synthesis tasks. Our system expands the scope of data-driven scene synthesis tasks and provides a novel, more flexible tool for users and designers in 3D scene generation. Code and dataset are available at https://github.com/bshfang/SceneMod.

View:

PDF

Citation:

European Association for Computer Graphics (2025).

Bibtex:

People

Raymond J. Mooney

Faculty

mooney [at] cs utexas edu

Areas of Interest

Deep Learning Language to 3D

Labs

Machine Learning