Yuxuan Wang is currently a third-year Ph.D. candidate from
MReal Lab in Nanyang Technological University (NTU) supervised by Prof.
Hanwang Zhang.
Before that, He was a Master of Science student in Show Lab in National University of Singapore (NUS) supervised by Prof.
Mike Zheng Shou.
He also works closely with Prof. Long Chen from HKUST and Prof. Na Zhao from SUTD.
His research interest includes but not limited to 3D Generaton, 3D Scene Editing and Understanding, and Vision-Language Understanding.
Yuxuan Wang*, Xuanyu Yi*, Haohan Weng*, Qingshan Xu, Xiaokang Wei, Xianghui Yang, Chunchao Guo, Long Chen, Hanwang Zhang ( * co-first authors)
arXiv Preprint, Under Review 2025 Preprint
We propose Nautilus, a locality-aware autoencoder for artist-like mesh generation, which leverages the local properties of manifold meshes to achieve structural fidelity and efficient representation.
Yuxuan Wang*, Xuanyu Yi*, Haohan Weng*, Qingshan Xu, Xiaokang Wei, Xianghui Yang, Chunchao Guo, Long Chen, Hanwang Zhang ( * co-first authors)
arXiv Preprint, Under Review 2025 Preprint
We propose Nautilus, a locality-aware autoencoder for artist-like mesh generation, which leverages the local properties of manifold meshes to achieve structural fidelity and efficient representation.
Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang
European Conference on Computer Vision (ECCV) 2024 Poster
In the diffusion model, we proposed effective multi-view consistency designs that harmonize the inconsistent multi-view image guidance by integrating with 3D Gaussian Splatting (3DGS) characteristics, offering high-quality 3DGS editing.
Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang
European Conference on Computer Vision (ECCV) 2024 Poster
In the diffusion model, we proposed effective multi-view consistency designs that harmonize the inconsistent multi-view image guidance by integrating with 3D Gaussian Splatting (3DGS) characteristics, offering high-quality 3DGS editing.
Xiaokang Wei, Bowen Zhang, Xianghui Yang, Yuxuan Wang, Chunchao Guo, Xi Zhao, Yan Luximon
arXiv Preprint, Under Review 2025 Preprint
In this work, we present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials that integrates the novel multi-view PBR material estimation model and a 3D PBR mesh reconstruction model.
Xiaokang Wei, Bowen Zhang, Xianghui Yang, Yuxuan Wang, Chunchao Guo, Xi Zhao, Yan Luximon
arXiv Preprint, Under Review 2025 Preprint
In this work, we present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials that integrates the novel multi-view PBR material estimation model and a 3D PBR mesh reconstruction model.
Yuxuan Wang, Xiaoyuan Liu
Empirical Methods in Natural Language Processing (EMNLP) 2024 Main Conference
We introduced a plug-and-play debiasing method for the zero-shot VLMs, dynamically ensembling them to address the underrepresentation issue in Scene Graph Generation (SGG) models.
Yuxuan Wang, Xiaoyuan Liu
Empirical Methods in Natural Language Processing (EMNLP) 2024 Main Conference
We introduced a plug-and-play debiasing method for the zero-shot VLMs, dynamically ensembling them to address the underrepresentation issue in Scene Graph Generation (SGG) models.
Qingshan Xu, Jiequan Cui, Xuanyu Yi, Yuxuan Wang, Yuan Zhou, Yew-Soon Ong, Hanwang Zhang
arXiv Preprint, Under Review 2025 Preprint
We propose Hard Gaussian Splatting, dubbed HGS, which considers multi-view significant positional gradients and rendering errors to grow hard Gaussians that fill the gaps of classical Gaussian Splatting on 3D scenes, thus achieving superior NVS results.
Qingshan Xu, Jiequan Cui, Xuanyu Yi, Yuxuan Wang, Yuan Zhou, Yew-Soon Ong, Hanwang Zhang
arXiv Preprint, Under Review 2025 Preprint
We propose Hard Gaussian Splatting, dubbed HGS, which considers multi-view significant positional gradients and rendering errors to grow hard Gaussians that fill the gaps of classical Gaussian Splatting on 3D scenes, thus achieving superior NVS results.
Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou
European Conference on Computer Vision (ECCV) 2022 Poster
We introduced three tasks of video boundary understanding on our new dataset called Kinetics-GEB+ (Generic Event Boundary Plus), consisting of over 170k boundaries associated with captions in 12K videos. Besides, we designed a new Temporal-based Pairwise Difference (TPD) Modeling method for visual difference representation and achieved significant performance improvements.
Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou
European Conference on Computer Vision (ECCV) 2022 Poster
We introduced three tasks of video boundary understanding on our new dataset called Kinetics-GEB+ (Generic Event Boundary Plus), consisting of over 170k boundaries associated with captions in 12K videos. Besides, we designed a new Temporal-based Pairwise Difference (TPD) Modeling method for visual difference representation and achieved significant performance improvements.
Stan Weixian Lei*, Difei Gao*, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou ( * co-first authors)
AAAI Conference on Artifical Intelligence (AAAI) 2023 Oral
We introduced Scene Graph as Prompt (SGP) for symbolic replay, a real-data-free replay-based method for Continual Learning VQA, which overcomes the limitations of replay-based methods by leveraging the scene graph as an alternative to images for replay.
Stan Weixian Lei*, Difei Gao*, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou ( * co-first authors)
AAAI Conference on Artifical Intelligence (AAAI) 2023 Oral
We introduced Scene Graph as Prompt (SGP) for symbolic replay, a real-data-free replay-based method for Continual Learning VQA, which overcomes the limitations of replay-based methods by leveraging the scene graph as an alternative to images for replay.
Stan Weixian Lei, Difei Gao, Yuxuan Wang, Dongxing Mao, Zihan Liang, Lingmin Ran, Mike Zheng Shou
Empirical Methods in Natural Language Processing (EMNLP) 2022 Findings
We introduce a new dataset and a new task called Affordance-centric Question-driven Video Segment Retrieval (AQVSR), aiming at retrieving affordance-centric instructional video segments given users’ questions. To address the task, we developed a straightforward model called Dual Multimodal Encoders (DME).
Stan Weixian Lei, Difei Gao, Yuxuan Wang, Dongxing Mao, Zihan Liang, Lingmin Ran, Mike Zheng Shou
Empirical Methods in Natural Language Processing (EMNLP) 2022 Findings
We introduce a new dataset and a new task called Affordance-centric Question-driven Video Segment Retrieval (AQVSR), aiming at retrieving affordance-centric instructional video segments given users’ questions. To address the task, we developed a straightforward model called Dual Multimodal Encoders (DME).
I love music and enjoy playing the piano and guitar. I used to sing as a tenor in BUAA Chorus and enjoyed sharing some covers on Netease and QQ Music. Some of my best friends are incredible music players and singers, and I’m so thankful for all the happiness they’ve brought me through music.
A special thanks to my girlfriend, Sijing, for always being by my side with her support and love.