Yuxuan Wang 王宇轩

Yuxuan Wang is currently a third-year Ph.D. candidate from MReal Lab in Nanyang Technological University (NTU) supervised by Prof. Hanwang Zhang. Before that, He was a Master of Science student in Show Lab in National University of Singapore (NUS) supervised by Prof. Mike Zheng Shou. He also works closely with Prof. Long Chen from HKUST and Prof. Na Zhao from SUTD.

His research interest includes but not limited to 3D Generaton, 3D Scene Editing and Understanding, and Vision-Language Understanding.

Curriculum Vitae

Education
  • Nanyang Technological University
    Nanyang Technological University
    College of Computing and Data Science
    Ph.D. Candidate
    Aug. 2022 - present
  • National University of Singapore
    National University of Singapore
    Electrical and Computer Engineering
    Master of Science
    Aug. 2021 - Jun. 2022
  • Beihang University
    Beihang University
    Electronic and Information Engineering
    Bachelor of Engineering
    Sep. 2016 - Jun. 2020
Experience
  • Tencent, Hunyuan
    Tencent, Hunyuan
    3D AIGC Center
    Research Intern
    2024
  • MReaL Lab, NTU
    MReaL Lab, NTU
    College of Computing and Data Science
    Research Associate
    2022
  • Inspur Co., Ltd
    Inspur Co., Ltd
    Inspur International
    Software Develop Intern
    2021
News
2025
We introduce our new Autoregressive Mesh Generation framework Nautilus.
Jan 24
2024
One paper accepted to EMNLP 2024 as Main Conference presentation.
Sep 24
One paper accepted to ECCV 2024 as Poster presentation.
Jul 01
2022
One paper accepted to AAAI 2023 as Oral Presentation.
Nov 18
One paper accepted to EMNLP 2022 as Findings presentation.
Oct 06
One paper accepted to ECCV 2022 as Poster presentation.
Jul 03
We organized the 2nd LOng form VidEo Understanding (LOVEU) Workshop at CVPR 2022.
Jun 19
Publications
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang*, Xuanyu Yi*, Haohan Weng*, Qingshan Xu, Xiaokang Wei, Xianghui Yang, Chunchao Guo, Long Chen, Hanwang Zhang ( * co-first authors)

arXiv Preprint, Under Review 2025 Preprint

We propose Nautilus, a locality-aware autoencoder for artist-like mesh generation, which leverages the local properties of manifold meshes to achieve structural fidelity and efficient representation.

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang*, Xuanyu Yi*, Haohan Weng*, Qingshan Xu, Xiaokang Wei, Xianghui Yang, Chunchao Guo, Long Chen, Hanwang Zhang ( * co-first authors)

arXiv Preprint, Under Review 2025 Preprint

We propose Nautilus, a locality-aware autoencoder for artist-like mesh generation, which leverages the local properties of manifold meshes to achieve structural fidelity and efficient representation.

View-Consistent 3D Editing with Gaussian Splatting
View-Consistent 3D Editing with Gaussian Splatting

Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang

European Conference on Computer Vision (ECCV) 2024 Poster

In the diffusion model, we proposed effective multi-view consistency designs that harmonize the inconsistent multi-view image guidance by integrating with 3D Gaussian Splatting (3DGS) characteristics, offering high-quality 3DGS editing.

View-Consistent 3D Editing with Gaussian Splatting

Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang

European Conference on Computer Vision (ECCV) 2024 Poster

In the diffusion model, we proposed effective multi-view consistency designs that harmonize the inconsistent multi-view image guidance by integrating with 3D Gaussian Splatting (3DGS) characteristics, offering high-quality 3DGS editing.

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture
PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

Xiaokang Wei, Bowen Zhang, Xianghui Yang, Yuxuan Wang, Chunchao Guo, Xi Zhao, Yan Luximon

arXiv Preprint, Under Review 2025 Preprint

In this work, we present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials that integrates the novel multi-view PBR material estimation model and a 3D PBR mesh reconstruction model.

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

Xiaokang Wei, Bowen Zhang, Xianghui Yang, Yuxuan Wang, Chunchao Guo, Xi Zhao, Yan Luximon

arXiv Preprint, Under Review 2025 Preprint

In this work, we present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials that integrates the novel multi-view PBR material estimation model and a 3D PBR mesh reconstruction model.

Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation
Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation

Yuxuan Wang, Xiaoyuan Liu

Empirical Methods in Natural Language Processing (EMNLP) 2024 Main Conference

We introduced a plug-and-play debiasing method for the zero-shot VLMs, dynamically ensembling them to address the underrepresentation issue in Scene Graph Generation (SGG) models.

Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation

Yuxuan Wang, Xiaoyuan Liu

Empirical Methods in Natural Language Processing (EMNLP) 2024 Main Conference

We introduced a plug-and-play debiasing method for the zero-shot VLMs, dynamically ensembling them to address the underrepresentation issue in Scene Graph Generation (SGG) models.

Pushing Rendering Boundaries: Hard Gaussian Splatting
Pushing Rendering Boundaries: Hard Gaussian Splatting

Qingshan Xu, Jiequan Cui, Xuanyu Yi, Yuxuan Wang, Yuan Zhou, Yew-Soon Ong, Hanwang Zhang

arXiv Preprint, Under Review 2025 Preprint

We propose Hard Gaussian Splatting, dubbed HGS, which considers multi-view significant positional gradients and rendering errors to grow hard Gaussians that fill the gaps of classical Gaussian Splatting on 3D scenes, thus achieving superior NVS results.

Pushing Rendering Boundaries: Hard Gaussian Splatting

Qingshan Xu, Jiequan Cui, Xuanyu Yi, Yuxuan Wang, Yuan Zhou, Yew-Soon Ong, Hanwang Zhang

arXiv Preprint, Under Review 2025 Preprint

We propose Hard Gaussian Splatting, dubbed HGS, which considers multi-view significant positional gradients and rendering errors to grow hard Gaussians that fill the gaps of classical Gaussian Splatting on 3D scenes, thus achieving superior NVS results.

GEB+: A Benchmark for Generic Event Boundary Captioning, Groundingand Retrieval
GEB+: A Benchmark for Generic Event Boundary Captioning, Groundingand Retrieval

Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou

European Conference on Computer Vision (ECCV) 2022 Poster

We introduced three tasks of video boundary understanding on our new dataset called Kinetics-GEB+ (Generic Event Boundary Plus), consisting of over 170k boundaries associated with captions in 12K videos. Besides, we designed a new Temporal-based Pairwise Difference (TPD) Modeling method for visual difference representation and achieved significant performance improvements.

GEB+: A Benchmark for Generic Event Boundary Captioning, Groundingand Retrieval

Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou

European Conference on Computer Vision (ECCV) 2022 Poster

We introduced three tasks of video boundary understanding on our new dataset called Kinetics-GEB+ (Generic Event Boundary Plus), consisting of over 170k boundaries associated with captions in 12K videos. Besides, we designed a new Temporal-based Pairwise Difference (TPD) Modeling method for visual difference representation and achieved significant performance improvements.

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

Stan Weixian Lei*, Difei Gao*, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou ( * co-first authors)

AAAI Conference on Artifical Intelligence (AAAI) 2023 Oral

We introduced Scene Graph as Prompt (SGP) for symbolic replay, a real-data-free replay-based method for Continual Learning VQA, which overcomes the limitations of replay-based methods by leveraging the scene graph as an alternative to images for replay.

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

Stan Weixian Lei*, Difei Gao*, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou ( * co-first authors)

AAAI Conference on Artifical Intelligence (AAAI) 2023 Oral

We introduced Scene Graph as Prompt (SGP) for symbolic replay, a real-data-free replay-based method for Continual Learning VQA, which overcomes the limitations of replay-based methods by leveraging the scene graph as an alternative to images for replay.

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant

Stan Weixian Lei, Difei Gao, Yuxuan Wang, Dongxing Mao, Zihan Liang, Lingmin Ran, Mike Zheng Shou

Empirical Methods in Natural Language Processing (EMNLP) 2022 Findings

We introduce a new dataset and a new task called Affordance-centric Question-driven Video Segment Retrieval (AQVSR), aiming at retrieving affordance-centric instructional video segments given users’ questions. To address the task, we developed a straightforward model called Dual Multimodal Encoders (DME).

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant

Stan Weixian Lei, Difei Gao, Yuxuan Wang, Dongxing Mao, Zihan Liang, Lingmin Ran, Mike Zheng Shou

Empirical Methods in Natural Language Processing (EMNLP) 2022 Findings

We introduce a new dataset and a new task called Affordance-centric Question-driven Video Segment Retrieval (AQVSR), aiming at retrieving affordance-centric instructional video segments given users’ questions. To address the task, we developed a straightforward model called Dual Multimodal Encoders (DME).

Others about Me

I love music and enjoy playing the piano and guitar. I used to sing as a tenor in BUAA Chorus and enjoyed sharing some covers on Netease and QQ Music. Some of my best friends are incredible music players and singers, and I’m so thankful for all the happiness they’ve brought me through music.

A special thanks to my girlfriend, Sijing, for always being by my side with her support and love.