About me

I am a Research Scientist at Embodied AI Center, Shanghai AI Laboratory, and lead a team working on 3D Perception and Navigation Intelligence. My research focuses on building a foundation model that can understand our 3D world comprehensively (a.k.a. Spatial Intelligence), especially from ego-centric observations, and ultimately enables general physical intelligence. In recent years, we have contributed several fundamental endeavors from general 3D perception (Cylinder3D, FCOS3D, DfM), embodied multi-modal 3D perception (EmbodiedScan, PointLLM, LLaVA-3D), to downstream embodied tasks (NavDP, GRUtopia), with continuing open-source efforts (MMDetection3D, OpenRobotLab).

Working with Dr. Jiangmiao Pang, we are dedicated to building Embodied AGI systems and empowering academia and industry through open-source initiatives. If you are interested, please reach out to us for potential positions or collaborations.

I earned my Ph.D. degree from MMLab, The Chinese University of Hong Kong, supervised by Prof. Dahua Lin. Before that, I received my B.Eng degree from Zhejiang University with the highest honors.

News

[2025/06] We release NavDP, StreamVLN, MMSI-Bench and GLEAM.
[2024/07] We release GRUtopia, MMScan and Grounded 3D-LLM.
[2024/03] EmbodiedScan and GenNBV are accepted by CVPR 2024. The Challenge Server is online!
[2024/02] We will host the Multi-View 3D Visual Grounding track in the Autonomous Grand Challenge.
[2024/01] UniHSI is accepted by ICLR 2024 as Spotlight.
[2023/12] We release EmbodiedScan, the first ego-centric, multi-modal 3D perception suite for holistic 3D scene understanding.
[2023/08] We release PointLLM, the first work empowering LLMs to understand point clouds with solid evaluation and benchmarks.

Education


The Chinese University of Hong Kong (CUHK): August 2019 - July 2023; Ph.D. in Information Engineering

Zhejiang University (ZJU): August 2015 - July 2019; Major: B.E. in Information Engineering; Minor: Advanced Honor Class of Engineering Education (ACEE), Chu Kochen Honors College

Selected Publications

Navigation & Exploration


StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling: Meng Wei*, Chenyang Wan*, Xiqian Yu*, Tai Wang*‡, et al.; ArXiv preprint; [Project Page] [Paper] [Code] [Zhihu]

NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance: Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, …, Tai Wang†, Jiangmiao Pang†; ArXiv preprint; [Project Page] [Paper] [Code] [Zhihu]

GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes: Xiao Chen, Tai Wang, Quanyi Li, Tao Huang, Jiangmiao Pang, Tianfan Xue; IEEE/CVF International Conference on Computer Vision (ICCV) 2025; [Project Page] [Paper] [Code]

Embodied Multi-Modal 3D Perception


MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence: Sihan Yang*, Runsen Xu*‡, Yiman Xie, Sizhe Yang, …, Tai Wang†, Jiangmiao Pang†; ArXiv preprint; [Project Page] [Paper] [Code] [中文解读]

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness: Chenming Zhu, Tai Wang†, Wenwei Zhang, Jiangmiao Pang, Xihui Liu†; IEEE/CVF International Conference on Computer Vision (ICCV) 2025; [Project Page] [Paper] [Code]

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations: Ruiyuan Lyu*, Tai Wang*, Jingli Lin*, Shuai Yang*, et al.; Conference on Neural Information Processing Systems (NeurIPS) 2024; [Project Page] [Paper] [Code]

Grounded 3D-LLM with Referent Tokens: Yilun Chen*, Shuai Yang*, Haifeng Huang*, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang; ArXiv preprint; [Project Page] [Paper] [Code]

Empowering 3D Visual Grounding with Reasoning Capabilities: Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu; European Conference on Computer Vision (ECCV) 2024; [Project Page] [Paper] [Code]

PointLLM: Empowering Large Language Models to Understand Point Clouds: Runsen Xu, Xiaolong Wang, Tai Wang†, Yilun Chen, Jiangmiao Pang†, Dahua Lin; European Conference on Computer Vision (ECCV) 2024, Best Paper Candidate (all strong accept); [Project Page] [Paper] [Code]

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI: Tai Wang*, Xiaohan Mao*, Chenming Zhu*, et al.; IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024; [Project Page] [Paper] [Code] [中文解读]

Embodied Interaction & Simulation


CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics: Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, et al.; Conference on Neural Information Processing Systems (NeurIPS) 2024, Spotlight; [Paper] [Code]

GRUtopia: Dream General Robots in a City at Scale: Hanqing Wang*, Jiahe Chen*, Wensi Huang*, Qingwei Ben*, Tai Wang*, Boyu Mi*, et al.; ArXiv preprint; [Project Page] [Paper] [Code] [Doc] [Youtube] [bilibili]

UniHSI: Unified Human-Scene Interaction via Prompted Chain-of-Contacts: Zeqi Xiao, Tai Wang, Jingbo Wang, Jinkun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang; International Conference on Learning Representations (ICLR) 2024, Spotlight; [Project Page] [Paper] [Code]

Vision-Based 3D Perception

DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking: Qing Lian, Tai Wang, Jiangmiao Pang, Dahua Lin; Conference on Robot Learning (CoRL) 2023; [Paper] [Code]

Vision-Centric BEV Perception: A Survey: Yuexin Ma*, Tai Wang*, Xuyang Bai*, Huitong Yang, Yuenan Hou, Yaming Wang,
Yu Qiao, Ruigang Yang, Dinesh Manocha, Xinge Zhu; IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2024; [Paper] [Code]

Scene as Occupancy: Chonghao Sima*, Wenwen Tong*, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu,
Ping Luo, Dahua Lin, Hongyang Li; End-to-End Autonomous Driving, CVPR 2023 Workshop and Challenge; IEEE/CVF International Conference on Computer Vision (ICCV) 2023; [Paper] [Code]

Monocular 3D Object Detection with Depth from Motion: Tai Wang, Jiangmiao Pang, Dahua Lin; European Conference on Computer Vision (ECCV) 2022, Oral; [Paper] [Code]

Probabilistic and Geometric Depth: Detecting Objects in Perspective: Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin; Conference on Robot Learning (CoRL) 2021; [Paper] [Code] [Poster]

FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection: Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin; ICCV Workshop on 3D Object Detection from Images (ICCVW) 2021, Best Paper Award; 1st place solution of vision-only methods in the nuScenes 3D detection challenge, NeurIPS 2020; [Paper] [Code] [Slides] [Zhihu]

Voxel Representation Learning in LiDAR-Based Perception

Position-Guided Point Cloud Panoptic Segmentation Transformer: Zeqi Xiao*, Wenwei Zhang*, Tai Wang*, Chen Change Loy, Dahua Lin, Jiangmiao Pang; International Journal of Computer Vision (IJCV) 2024; [Paper] [Code]

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation: Xinge Zhu*, Hui Zhou*, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, Dahua Lin; IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021, Oral; IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021; [Paper] [Code] [TPAMI version] [Bibtex]

Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds: Tai Wang, Xinge Zhu, Dahua Lin; Conference on Robot Learning (CoRL) 2020; [Paper] [Spotlight Talk]

Efficient Annotation of LiDAR Point Clouds

FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-based Point Clouds: Tai Wang, Conghui He, Zhe Wang, Jianping Shi, Dahua Lin; ACM Symposium on User Interface Software and Technology (UIST) 2020, Poster; [Full Tech Report] [Poster] [Poster Summary] [Demo]

Research Projects


MMDetection3D: The Next-Generation Platform for General 3D Detection: A versatile, open-source 3D object detection toolbox based on PyTorch; MMDetection3D Contributors; May 2020 – Now; [Code] [Doc] [Bibtex]

Selected Awards

Best Paper Award Candidate of ECCV 2024
Runner-up of Waymo Camera-Only 3D Detection Challenge, CVPR 2022
Best Paper Award of Workshop on 3D Object Detection from Images, ICCV 2021
1st place of vision-only track and best PKL award of overall track, NuScenes 3D Detection Challenge, NeurIPS 2020
Runner-up of NuScenes LiDAR Segmentation Challenge, NeurIPS 2020
Gold Medal of Kaggle Competition (Top 1% of Lyft 3D Detection Challenge), NeurIPS 2019
Hong Kong PhD Fellowship (HKPFS), 2019
Chu Kochen Scholarship (Highest scholarship at Zhejiang University), 2018
Top 10 Students of ZJU (Highest honor for 5 undergraduates/graduates), 2018
National Scholarship (1.5%), 2017-2018
First Prize in Physics Competition for Undergraduate, 2017

Teaching

Computer Vision (Undergraduate Course), Winter 2018 @ ZJU
IERG2080: Introduction to Systems Programming, Fall 2020 @ CUHK
IERG2470B/ESTR2308: Probability Models and Applications (Elite Students), Spring 2021 @ CUHK

Miscellaneous

Academic Services
I served as a reviewer for CVPR, ICCV, ECCV, CoRL, NeurIPS, ICLR, ICML, WACV, TPAMI, IJCV, TVCG.

Hobbies
Love: 🏀Basketball (I am a big fan of Stephen Curry and Tracy McGrady), 🎵music/🎤singing and good at 🖌️Chinese calligraphy (learned from MA Liangchen and MA Shanshuang).

Tai Wang 王泰