About me

I am a researcher at OpenRobotLab, Shanghai AI Laboratory, working on embodied AI. My research focuses on constructing a comprehensive 3D understanding of our world from ego-centric, multi-modal inputs, thereby enabling embodied planning and physical interactions. In recent years, we have contributed several fundamental endeavors from general 3D perception (Cylinder3D, FCOS3D, DfM), embodied multi-modal 3D perception (EmbodiedScan, PointLLM, Grounded 3D-LLM), to embodied interaction (UniHSI, GRUtopia), with continuing open-source efforts (MMDetection3D, OpenRobotLab).

Working with Dr. Jiangmiao Pang and Prof. Dahua Lin, our group is dedicated to building Embodied AGI systems and empowering academia and industry through open-source initiatives. If you are interested, please reach out to us for potential positions or collaborations.

I earned my Ph.D. degree from MMLab, The Chinese University of Hong Kong. Before that, I received my B.Eng degree from Zhejiang University with the highest honors.

News

  • [2024/07] We release GRUtopia, MMScan and Grounded 3D-LLM.
  • [2024/03] EmbodiedScan and GenNBV are accepted by CVPR 2024. The Challenge Server is online!
  • [2024/02] We will host the Multi-View 3D Visual Grounding track in the Autonomous Grand Challenge.
  • [2024/01] UniHSI is accepted by ICLR 2024 as Spotlight.
  • [2023/12] We release EmbodiedScan, the first ego-centric, multi-modal 3D perception suite for holistic 3D scene understanding.
  • [2023/08] We release PointLLM, the first work empowering LLMs to understand point clouds with solid evaluation and benchmarks.

Education

cuhk
 The Chinese University of Hong Kong (CUHK)
  August 2019 - July 2023
  Ph.D. in Information Engineering
zju
 Zhejiang University (ZJU)
  August 2015 - July 2019
  Major: B.E. in Information Engineering
  Minor: Advanced Honor Class of Engineering Education (ACEE), Chu Kochen Honors College

Selected Publications

Embodied Multi-Modal 3D Perception

llava3d
 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs
 with 3D-awareness
  Chenming Zhu, Tai Wang†, Wenwei Zhang, Jiangmiao Pang, Xihui Liu†
  ArXiv preprint
  [Project Page] [Paper] [Code]
mmscan
 MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical
 Grounded Language Annotations
  Ruiyuan Lyu*, Tai Wang*, Jingli Lin*, Shuai Yang*, et al.
  Conference on Neural Information Processing Systems (NeurIPS) 2024
  [Project Page] [Paper] [Code]
grounded-3d-llm
 Grounded 3D-LLM with Referent Tokens
  Yilun Chen*, Shuai Yang*, Haifeng Huang*, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang
  ArXiv preprint
  [Project Page] [Paper] [Code]
scanreason
 Empowering 3D Visual Grounding with Reasoning Capabilities
  Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu
  European Conference on Computer Vision (ECCV) 2024
  [Project Page] [Paper] [Code]
pointllm
 PointLLM: Empowering Large Language Models to Understand Point Clouds
  Runsen Xu, Xiaolong Wang, Tai Wang†, Yilun Chen, Jiangmiao Pang†, Dahua Lin
  European Conference on Computer Vision (ECCV) 2024, Best Paper Candidate (all strong accept)
  [Project Page] [Paper] [Code]
embodiedscan
 EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite
 Towards Embodied AI
  Tai Wang*, Xiaohan Mao*, Chenming Zhu*, et al.
  IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024
  [Project Page] [Paper] [Code] [中文解读]

Embodied Interaction

coohoi
 CooHOI: Learning Cooperative Human-Object Interaction with
 Manipulated Object Dynamics
  Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, et al.
  Conference on Neural Information Processing Systems (NeurIPS) 2024, Spotlight
  [Paper] [Code](Coming Soon)
grutopia
 GRUtopia: Dream General Robots in a City at Scale
  Hanqing Wang*, Jiahe Chen*, Wensi Huang*, Qingwei Ben*, Tai Wang*, Boyu Mi*, et al.
  ArXiv preprint
  [Project Page] [Paper] [Code] [Doc] [Youtube] [bilibili]
unihsi
 UniHSI: Unified Human-Scene Interaction via Prompted Chain-of-Contacts
  Zeqi Xiao, Tai Wang, Jingbo Wang, Jinkun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang
  International Conference on Learning Representations (ICLR) 2024, Spotlight
  [Project Page] [Paper] [Code]

Vision-Based 3D Perception
dort
 DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera
 3D Object Detection and Tracking
  Qing Lian, Tai Wang, Jiangmiao Pang, Dahua Lin
  Conference on Robot Learning (CoRL) 2023
  [Paper] [Code]
bev-survey
 Vision-Centric BEV Perception: A Survey
  Yuexin Ma*, Tai Wang*, Xuyang Bai*, Huitong Yang, Yuenan Hou, Yaming Wang,
  Yu Qiao, Ruigang Yang, Dinesh Manocha, Xinge Zhu
  IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2024
  [Paper] [Code]
occupancy
 Scene as Occupancy
  Chonghao Sima*, Wenwen Tong*, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu,
  Ping Luo, Dahua Lin, Hongyang Li
  End-to-End Autonomous Driving, CVPR 2023 Workshop and Challenge
  IEEE/CVF International Conference on Computer Vision (ICCV) 2023
  [Paper] [Code]
dfm
 Monocular 3D Object Detection with Depth from Motion
  Tai Wang, Jiangmiao Pang, Dahua Lin
  European Conference on Computer Vision (ECCV) 2022, Oral
  [Paper] [Code]
pgd
 Probabilistic and Geometric Depth: Detecting Objects in Perspective
  Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin
  Conference on Robot Learning (CoRL) 2021
  [Paper] [Code] [Poster]
fcos3d
 FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
  Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin
  ICCV Workshop on 3D Object Detection from Images (ICCVW) 2021, Best Paper Award
  1st place solution of vision-only methods in the nuScenes 3D detection challenge, NeurIPS 2020
  [Paper] [Code] [Slides] [Zhihu]

Voxel Representation Learning in LiDAR-Based Perception
p3former
 Position-Guided Point Cloud Panoptic Segmentation Transformer
  Zeqi Xiao*, Wenwei Zhang*, Tai Wang*, Chen Change Loy, Dahua Lin, Jiangmiao Pang
  International Journal of Computer Vision (IJCV) 2024
  [Paper] [Code]
cylinder3d
 Cylindrical and Asymmetrical 3D Convolution Networks for
 LiDAR Segmentation
  Xinge Zhu*, Hui Zhou*, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, Dahua Lin
  IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021, Oral
  IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021
  [Paper] [Code] [TPAMI version] [Bibtex]
reconfig
 Reconfigurable Voxels: A New Representation for LiDAR-Based
 Point Clouds
  Tai Wang, Xinge Zhu, Dahua Lin
  Conference on Robot Learning (CoRL) 2020
  [Paper] [Spotlight Talk]

Efficient Annotation of LiDAR Point Clouds
flava
 FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-based
 Point Clouds
  Tai Wang, Conghui He, Zhe Wang, Jianping Shi, Dahua Lin
  ACM Symposium on User Interface Software and Technology (UIST) 2020, Poster
  [Full Tech Report] [Poster] [Poster Summary] [Demo]

Research Projects

mmdet3d
 MMDetection3D: The Next-Generation Platform for General 3D Detection
  A versatile, open-source 3D object detection toolbox based on PyTorch
  MMDetection3D Contributors
  May 2020 – Now
  [Code] [Doc] [Bibtex]

Selected Awards

Teaching

  • Computer Vision (Undergraduate Course), Winter 2018 @ ZJU
  • IERG2080: Introduction to Systems Programming, Fall 2020 @ CUHK
  • IERG2470B/ESTR2308: Probability Models and Applications (Elite Students), Spring 2021 @ CUHK

Miscellaneous

Academic Services
I served as a reviewer for CVPR, ICCV, ECCV, CoRL, NeurIPS, ICLR, ICML, WACV, TPAMI, IJCV, TVCG.

Hobbies
Love: 🏀Basketball (I am a big fan of Stephen Curry and Tracy McGrady), 🎵music/🎤singing and good at 🖌️Chinese calligraphy (learned from MA Liangchen and MA Shanshuang).