Home - Yuxuan Wang

Publications

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

Yuxuan Wang, Xuanyu Yi, Qingshan Xu, Yuan Zhou, Long Chen, Hanwang Zhang

AAAI Conference on Artifical Intelligence (AAAI), 2026

We present Consistent Personalization for 3D Gaussian Splatting (CP-GS), a framework that progressively propagates the single-view reference appearance to novel perspectives, offering high-quality 3DGS personalization with faithful referential alignment.

[arXiv] [BibTeX]

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

Yuxuan Wang, Xuanyu Yi, Qingshan Xu, Yuan Zhou, Long Chen, Hanwang Zhang

AAAI Conference on Artifical Intelligence (AAAI), 2026

[arXiv] [BibTeX]

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang*, Xuanyu Yi*, Haohan Weng*, Qingshan Xu, Xiaokang Wei, Xianghui Yang, Chunchao Guo, Long Chen, Hanwang Zhang ( * co-first authors)

International Conference on Computer Vision (ICCV) 2025

We propose Nautilus, a locality-aware autoencoder for artist-like mesh generation, which leverages the local properties of manifold meshes to achieve structural fidelity and efficient representation.

[arXiv] [BibTeX] [Code]

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang*, Xuanyu Yi*, Haohan Weng*, Qingshan Xu, Xiaokang Wei, Xianghui Yang, Chunchao Guo, Long Chen, Hanwang Zhang ( * co-first authors)

International Conference on Computer Vision (ICCV) 2025

We propose Nautilus, a locality-aware autoencoder for artist-like mesh generation, which leverages the local properties of manifold meshes to achieve structural fidelity and efficient representation.

[arXiv] [BibTeX] [Code]

View-Consistent 3D Editing with Gaussian Splatting

Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang

European Conference on Computer Vision (ECCV) 2024

In the diffusion model, we proposed effective multi-view consistency designs that harmonize the inconsistent multi-view image guidance by integrating with 3D Gaussian Splatting (3DGS) characteristics, offering high-quality 3DGS editing.

[arXiv] [BibTeX] [Code]

View-Consistent 3D Editing with Gaussian Splatting

Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang

European Conference on Computer Vision (ECCV) 2024

[arXiv] [BibTeX] [Code]

Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation

Yuxuan Wang, Xiaoyuan Liu

Empirical Methods in Natural Language Processing (EMNLP) 2024

We introduced a plug-and-play debiasing method for the zero-shot VLMs, dynamically ensembling them to address the underrepresentation issue in Scene Graph Generation (SGG) models.

[arXiv] [BibTeX]

Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation

Yuxuan Wang, Xiaoyuan Liu

Empirical Methods in Natural Language Processing (EMNLP) 2024

We introduced a plug-and-play debiasing method for the zero-shot VLMs, dynamically ensembling them to address the underrepresentation issue in Scene Graph Generation (SGG) models.

[arXiv] [BibTeX]

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou

European Conference on Computer Vision (ECCV) 2022

We introduced three tasks of video boundary understanding on our new dataset called Kinetics-GEB+ (Generic Event Boundary Plus), consisting of over 170k boundaries associated with captions in 12K videos. Besides, we designed a new Temporal-based Pairwise Difference (TPD) Modeling method for visual difference representation and achieved significant performance improvements.

[arXiv] [BibTeX] [Code]

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou

European Conference on Computer Vision (ECCV) 2022

[arXiv] [BibTeX] [Code]

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

Xiaokang Wei, Bowen Zhang, Xianghui Yang, Yuxuan Wang, Chunchao Guo, Xi Zhao, Yan Luximon

AAAI Conference on Artifical Intelligence (AAAI), 2026

In this work, we present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials that integrates the novel multi-view PBR material estimation model and a 3D PBR mesh reconstruction model.

[arXiv] [BibTeX] [Code]

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

Xiaokang Wei, Bowen Zhang, Xianghui Yang, Yuxuan Wang, Chunchao Guo, Xi Zhao, Yan Luximon

AAAI Conference on Artifical Intelligence (AAAI), 2026

[arXiv] [BibTeX] [Code]

NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos

Qingshan Xu, Jiao Liu, Shangshu Yu, Yuxuan Wang, Yuan Zhou, Junbao Zhou, Jiequan Cui, Yew-Soon Ong, Hanwang Zhang

AAAI Conference on Artifical Intelligence (AAAI), 2026

We present NeuSpring, a neural spring field for the reconstruction and simulation of deformable objects from videos, achieving superior reconstruction and simulation performance for current state modeling and future prediction.

[arXiv] [BibTeX] [Code]

NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos

Qingshan Xu, Jiao Liu, Shangshu Yu, Yuxuan Wang, Yuan Zhou, Junbao Zhou, Jiequan Cui, Yew-Soon Ong, Hanwang Zhang

AAAI Conference on Artifical Intelligence (AAAI), 2026

[arXiv] [BibTeX] [Code]

Pushing Rendering Boundaries: Hard Gaussian Splatting

Qingshan Xu, Jiequan Cui, Xuanyu Yi, Yuxuan Wang, Yuan Zhou, Yew-Soon Ong, Hanwang Zhang

AAAI Conference on Artifical Intelligence (AAAI), 2026

We propose Hard Gaussian Splatting, dubbed HGS, which considers multi-view significant positional gradients and rendering errors to grow hard Gaussians that fill the gaps of classical Gaussian Splatting on 3D scenes, thus achieving superior NVS results.

[arXiv] [BibTeX]

Pushing Rendering Boundaries: Hard Gaussian Splatting

Qingshan Xu, Jiequan Cui, Xuanyu Yi, Yuxuan Wang, Yuan Zhou, Yew-Soon Ong, Hanwang Zhang

AAAI Conference on Artifical Intelligence (AAAI), 2026

[arXiv] [BibTeX]

DragNeXt: Rethinking Drag-Based Image Editing

Yuan Zhou, Junbao Zhou, Qingshan Xu, Kesen Zhao, Yuxuan Wang, Hao Fei, Richang Hong, Hanwang Zhang

AAAI Conference on Artifical Intelligence (AAAI), 2026

We propose a simple-yet-effective editing framework, dubbed DragNeXt, redefining Drag-Based Image Editing (DBIE) as deformation, rotation, and translation of user-specified handle regions.

[arXiv] [BibTeX]

DragNeXt: Rethinking Drag-Based Image Editing

Yuan Zhou, Junbao Zhou, Qingshan Xu, Kesen Zhao, Yuxuan Wang, Hao Fei, Richang Hong, Hanwang Zhang

AAAI Conference on Artifical Intelligence (AAAI), 2026

We propose a simple-yet-effective editing framework, dubbed DragNeXt, redefining Drag-Based Image Editing (DBIE) as deformation, rotation, and translation of user-specified handle regions.

[arXiv] [BibTeX]

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

Stan Weixian Lei*, Difei Gao*, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou ( * co-first authors)

AAAI Conference on Artifical Intelligence (AAAI), 2023

We introduced Scene Graph as Prompt (SGP) for symbolic replay, a real-data-free replay-based method for Continual Learning VQA, which overcomes the limitations of replay-based methods by leveraging the scene graph as an alternative to images for replay.

[arXiv] [BibTeX] [Code]

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

Stan Weixian Lei*, Difei Gao*, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou ( * co-first authors)

AAAI Conference on Artifical Intelligence (AAAI), 2023

[arXiv] [BibTeX] [Code]

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant

Stan Weixian Lei, Difei Gao, Yuxuan Wang, Dongxing Mao, Zihan Liang, Lingmin Ran, Mike Zheng Shou

Findings of Empirical Methods in Natural Language Processing (EMNLP) 2022

We introduce a new dataset and a new task called Affordance-centric Question-driven Video Segment Retrieval (AQVSR), aiming at retrieving affordance-centric instructional video segments given users’ questions. To address the task, we developed a straightforward model called Dual Multimodal Encoders (DME).

[arXiv] [BibTeX] [Code]

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant

Stan Weixian Lei, Difei Gao, Yuxuan Wang, Dongxing Mao, Zihan Liang, Lingmin Ran, Mike Zheng Shou

Findings of Empirical Methods in Natural Language Processing (EMNLP) 2022

[arXiv] [BibTeX] [Code]

Education

Experience

News

Publications

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

View-Consistent 3D Editing with Gaussian Splatting

View-Consistent 3D Editing with Gaussian Splatting

Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation

Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos

NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos

Pushing Rendering Boundaries: Hard Gaussian Splatting

Pushing Rendering Boundaries: Hard Gaussian Splatting

DragNeXt: Rethinking Drag-Based Image Editing

DragNeXt: Rethinking Drag-Based Image Editing

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant

Others about Me