About me

I am a Research Scientist at Embodied AI Center, Shanghai AI Laboratory, and lead a team working on 3D Perception and Navigation Intelligence. My research focuses on building a foundation model that can understand our 3D world comprehensively (a.k.a. Spatial Intelligence), especially from ego-centric observations, and ultimately enables general physical intelligence. In recent years, we have contributed several fundamental endeavors from general 3D perception (Cylinder3D, FCOS3D, DfM), embodied multi-modal 3D perception (EmbodiedScan, PointLLM, LLaVA-3D), to downstream embodied tasks (NavDP, GRUtopia), with continuing open-source efforts (MMDetection3D, OpenRobotLab).

Working with Dr. Jiangmiao Pang, we are dedicated to building Embodied AGI systems and empowering academia and industry through open-source initiatives. If you are interested, please reach out to us for potential positions or collaborations.

I earned my Ph.D. degree from MMLab, The Chinese University of Hong Kong, supervised by Prof. Dahua Lin. Before that, I received my B.Eng degree from Zhejiang University with the highest honors.

News

Education

cuhk
 The Chinese University of Hong Kong (CUHK)
  August 2019 - July 2023
  Ph.D. in Information Engineering
zju
 Zhejiang University (ZJU)
  August 2015 - July 2019
  Major: B.E. in Information Engineering
  Minor: Advanced Honor Class of Engineering Education (ACEE), Chu Kochen Honors College

Selected Publications

Navigation & Exploration

streamvln
 StreamVLN: Streaming Vision-and-Language Navigation via
 SlowFast Context Modeling
  Meng Wei*, Chenyang Wan*, Xiqian Yu*, Tai Wang*‡, et al.
  To be preprinted on arXiv
  [Project Page] [Paper](Coming Soon) [Code](Coming Soon) [Zhihu]
navdp
 NavDP: Learning Sim-to-Real Navigation Diffusion Policy with
 Privileged Information Guidance
  Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, …, Tai Wang†, Jiangmiao Pang†
  ArXiv preprint
  [Project Page] [Paper] [Code] [Zhihu]
gleam
 GLEAM: Learning Generalizable Exploration Policy for Active Mapping
 in Complex 3D Indoor Scenes
  Xiao Chen, Tai Wang, Quanyi Li, Tao Huang, Jiangmiao Pang, Tianfan Xue
  ArXiv preprint
  [Project Page] [Paper] [Code]

Embodied Multi-Modal 3D Perception

mmsi-bench
 MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
  Sihan Yang*, Runsen Xu*‡, Yiman Xie, Sizhe Yang, …, Tai Wang†, Jiangmiao Pang†
  ArXiv preprint
  [Project Page] [Paper] [Code] [中文解读]
llava3d
 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs
 with 3D-awareness
  Chenming Zhu, Tai Wang†, Wenwei Zhang, Jiangmiao Pang, Xihui Liu†
  ArXiv preprint
  [Project Page] [Paper] [Code]
mmscan
 MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical
 Grounded Language Annotations
  Ruiyuan Lyu*, Tai Wang*, Jingli Lin*, Shuai Yang*, et al.
  Conference on Neural Information Processing Systems (NeurIPS) 2024
  [Project Page] [Paper] [Code]
grounded-3d-llm
 Grounded 3D-LLM with Referent Tokens
  Yilun Chen*, Shuai Yang*, Haifeng Huang*, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang
  ArXiv preprint
  [Project Page] [Paper] [Code]
scanreason
 Empowering 3D Visual Grounding with Reasoning Capabilities
  Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu
  European Conference on Computer Vision (ECCV) 2024
  [Project Page] [Paper] [Code]
pointllm
 PointLLM: Empowering Large Language Models to Understand Point Clouds
  Runsen Xu, Xiaolong Wang, Tai Wang†, Yilun Chen, Jiangmiao Pang†, Dahua Lin
  European Conference on Computer Vision (ECCV) 2024, Best Paper Candidate (all strong accept)
  [Project Page] [Paper] [Code]
embodiedscan
 EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite
 Towards Embodied AI
  Tai Wang*, Xiaohan Mao*, Chenming Zhu*, et al.
  IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024
  [Project Page] [Paper] [Code] [中文解读]

Embodied Interaction & Simulation

coohoi
 CooHOI: Learning Cooperative Human-Object Interaction with
 Manipulated Object Dynamics
  Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, et al.
  Conference on Neural Information Processing Systems (NeurIPS) 2024, Spotlight
  [Paper] [Code](Coming Soon)
grutopia
 GRUtopia: Dream General Robots in a City at Scale
  Hanqing Wang*, Jiahe Chen*, Wensi Huang*, Qingwei Ben*, Tai Wang*, Boyu Mi*, et al.
  ArXiv preprint
  [Project Page] [Paper] [Code] [Doc] [Youtube] [bilibili]
unihsi
 UniHSI: Unified Human-Scene Interaction via Prompted Chain-of-Contacts
  Zeqi Xiao, Tai Wang, Jingbo Wang, Jinkun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang
  International Conference on Learning Representations (ICLR) 2024, Spotlight
  [Project Page] [Paper] [Code]

Vision-Based 3D Perception
dort
 DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera
 3D Object Detection and Tracking
  Qing Lian, Tai Wang, Jiangmiao Pang, Dahua Lin
  Conference on Robot Learning (CoRL) 2023
  [Paper] [Code]
bev-survey
 Vision-Centric BEV Perception: A Survey
  Yuexin Ma*, Tai Wang*, Xuyang Bai*, Huitong Yang, Yuenan Hou, Yaming Wang,
  Yu Qiao, Ruigang Yang, Dinesh Manocha, Xinge Zhu
  IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2024
  [Paper] [Code]
occupancy
 Scene as Occupancy
  Chonghao Sima*, Wenwen Tong*, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu,
  Ping Luo, Dahua Lin, Hongyang Li
  End-to-End Autonomous Driving, CVPR 2023 Workshop and Challenge
  IEEE/CVF International Conference on Computer Vision (ICCV) 2023
  [Paper] [Code]
dfm
 Monocular 3D Object Detection with Depth from Motion
  Tai Wang, Jiangmiao Pang, Dahua Lin
  European Conference on Computer Vision (ECCV) 2022, Oral
  [Paper] [Code]
pgd
 Probabilistic and Geometric Depth: Detecting Objects in Perspective
  Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin
  Conference on Robot Learning (CoRL) 2021
  [Paper] [Code] [Poster]
fcos3d
 FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
  Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin
  ICCV Workshop on 3D Object Detection from Images (ICCVW) 2021, Best Paper Award
  1st place solution of vision-only methods in the nuScenes 3D detection challenge, NeurIPS 2020
  [Paper] [Code] [Slides] [Zhihu]

Voxel Representation Learning in LiDAR-Based Perception
p3former
 Position-Guided Point Cloud Panoptic Segmentation Transformer
  Zeqi Xiao*, Wenwei Zhang*, Tai Wang*, Chen Change Loy, Dahua Lin, Jiangmiao Pang
  International Journal of Computer Vision (IJCV) 2024
  [Paper] [Code]
cylinder3d
 Cylindrical and Asymmetrical 3D Convolution Networks for
 LiDAR Segmentation
  Xinge Zhu*, Hui Zhou*, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, Dahua Lin
  IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021, Oral
  IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021
  [Paper] [Code] [TPAMI version] [Bibtex]
reconfig
 Reconfigurable Voxels: A New Representation for LiDAR-Based
 Point Clouds
  Tai Wang, Xinge Zhu, Dahua Lin
  Conference on Robot Learning (CoRL) 2020
  [Paper] [Spotlight Talk]

Efficient Annotation of LiDAR Point Clouds
flava
 FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-based
 Point Clouds
  Tai Wang, Conghui He, Zhe Wang, Jianping Shi, Dahua Lin
  ACM Symposium on User Interface Software and Technology (UIST) 2020, Poster
  [Full Tech Report] [Poster] [Poster Summary] [Demo]

Research Projects

mmdet3d
 MMDetection3D: The Next-Generation Platform for General 3D Detection
  A versatile, open-source 3D object detection toolbox based on PyTorch
  MMDetection3D Contributors
  May 2020 – Now
  [Code] [Doc] [Bibtex]

Selected Awards

Teaching

  • Computer Vision (Undergraduate Course), Winter 2018 @ ZJU
  • IERG2080: Introduction to Systems Programming, Fall 2020 @ CUHK
  • IERG2470B/ESTR2308: Probability Models and Applications (Elite Students), Spring 2021 @ CUHK

Miscellaneous

Academic Services
I served as a reviewer for CVPR, ICCV, ECCV, CoRL, NeurIPS, ICLR, ICML, WACV, TPAMI, IJCV, TVCG.

Hobbies
Love: 🏀Basketball (I am a big fan of Stephen Curry and Tracy McGrady), 🎵music/🎤singing and good at 🖌️Chinese calligraphy (learned from MA Liangchen and MA Shanshuang).