About me
I am a researcher at OpenRobotLab, Shanghai AI Laboratory, working on embodied AI. My research focuses on constructing a comprehensive 3D understanding of our world from ego-centric, multi-modal inputs, thereby enabling embodied planning and physical interactions. In recent years, we have contributed several fundamental endeavors from general 3D perception (Cylinder3D, FCOS3D, DfM), embodied multi-modal 3D perception (EmbodiedScan, PointLLM, Grounded 3D-LLM), to embodied interaction (UniHSI, GRUtopia), with continuing open-source efforts (MMDetection3D, OpenRobotLab).
Working with Dr. Jiangmiao Pang and Prof. Dahua Lin, our group is dedicated to building Embodied AGI systems and empowering academia and industry through open-source initiatives. If you are interested, please reach out to us for potential positions or collaborations.
I earned my Ph.D. degree from MMLab, The Chinese University of Hong Kong. Before that, I received my B.Eng degree from Zhejiang University with the highest honors.
News
- [2024/07] We release GRUtopia, MMScan and Grounded 3D-LLM.
- [2024/03] EmbodiedScan and GenNBV are accepted by CVPR 2024. The Challenge Server is online!
- [2024/02] We will host the Multi-View 3D Visual Grounding track in the Autonomous Grand Challenge.
- [2024/01] UniHSI is accepted by ICLR 2024 as Spotlight.
- [2023/12] We release EmbodiedScan, the first ego-centric, multi-modal 3D perception suite for holistic 3D scene understanding.
- [2023/08] We release PointLLM, the first work empowering LLMs to understand point clouds with solid evaluation and benchmarks.
Education
- The Chinese University of Hong Kong (CUHK)
- August 2019 - July 2023
- Ph.D. in Information Engineering
- Zhejiang University (ZJU)
- August 2015 - July 2019
- Major: B.E. in Information Engineering
- Minor: Advanced Honor Class of Engineering Education (ACEE), Chu Kochen Honors College
Selected Publications
Embodied Multi-Modal 3D Perception
- LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs
with 3D-awareness - Chenming Zhu, Tai Wang†, Wenwei Zhang, Jiangmiao Pang, Xihui Liu†
- ArXiv preprint
- [Project Page] [Paper] [Code]
- MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical
Grounded Language Annotations - Ruiyuan Lyu*, Tai Wang*, Jingli Lin*, Shuai Yang*, et al.
- Conference on Neural Information Processing Systems (NeurIPS) 2024
- [Project Page] [Paper] [Code]
- Grounded 3D-LLM with Referent Tokens
- Yilun Chen*, Shuai Yang*, Haifeng Huang*, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang
- ArXiv preprint
- [Project Page] [Paper] [Code]
- Empowering 3D Visual Grounding with Reasoning Capabilities
- Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu
- European Conference on Computer Vision (ECCV) 2024
- [Project Page] [Paper] [Code]
- PointLLM: Empowering Large Language Models to Understand Point Clouds
- Runsen Xu, Xiaolong Wang, Tai Wang†, Yilun Chen, Jiangmiao Pang†, Dahua Lin
- European Conference on Computer Vision (ECCV) 2024, Best Paper Candidate (all strong accept)
- [Project Page] [Paper] [Code]
- EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite
Towards Embodied AI - Tai Wang*, Xiaohan Mao*, Chenming Zhu*, et al.
- IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024
- [Project Page] [Paper] [Code] [中文解读]
Embodied Interaction
- CooHOI: Learning Cooperative Human-Object Interaction with
Manipulated Object Dynamics - Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, et al.
- Conference on Neural Information Processing Systems (NeurIPS) 2024, Spotlight
- [Paper] [Code](Coming Soon)
- GRUtopia: Dream General Robots in a City at Scale
- Hanqing Wang*, Jiahe Chen*, Wensi Huang*, Qingwei Ben*, Tai Wang*, Boyu Mi*, et al.
- ArXiv preprint
- [Project Page] [Paper] [Code] [Doc] [Youtube] [bilibili]
- UniHSI: Unified Human-Scene Interaction via Prompted Chain-of-Contacts
- Zeqi Xiao, Tai Wang, Jingbo Wang, Jinkun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang
- International Conference on Learning Representations (ICLR) 2024, Spotlight
- [Project Page] [Paper] [Code]
- Vision-Based 3D Perception
- DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera
3D Object Detection and Tracking - Qing Lian, Tai Wang, Jiangmiao Pang, Dahua Lin
- Conference on Robot Learning (CoRL) 2023
- [Paper] [Code]
- Vision-Centric BEV Perception: A Survey
- Yuexin Ma*, Tai Wang*, Xuyang Bai*, Huitong Yang, Yuenan Hou, Yaming Wang,
Yu Qiao, Ruigang Yang, Dinesh Manocha, Xinge Zhu - IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2024
- [Paper] [Code]
- Scene as Occupancy
- Chonghao Sima*, Wenwen Tong*, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu,
Ping Luo, Dahua Lin, Hongyang Li - End-to-End Autonomous Driving, CVPR 2023 Workshop and Challenge
- IEEE/CVF International Conference on Computer Vision (ICCV) 2023
- [Paper] [Code]
- Monocular 3D Object Detection with Depth from Motion
- Tai Wang, Jiangmiao Pang, Dahua Lin
- European Conference on Computer Vision (ECCV) 2022, Oral
- [Paper] [Code]
- Probabilistic and Geometric Depth: Detecting Objects in Perspective
- Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin
- Conference on Robot Learning (CoRL) 2021
- [Paper] [Code] [Poster]
- FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
- Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin
- ICCV Workshop on 3D Object Detection from Images (ICCVW) 2021, Best Paper Award
- 1st place solution of vision-only methods in the nuScenes 3D detection challenge, NeurIPS 2020
- [Paper] [Code] [Slides] [Zhihu]
- Voxel Representation Learning in LiDAR-Based Perception
- Position-Guided Point Cloud Panoptic Segmentation Transformer
- Zeqi Xiao*, Wenwei Zhang*, Tai Wang*, Chen Change Loy, Dahua Lin, Jiangmiao Pang
- International Journal of Computer Vision (IJCV) 2024
- [Paper] [Code]
- Cylindrical and Asymmetrical 3D Convolution Networks for
LiDAR Segmentation - Xinge Zhu*, Hui Zhou*, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, Dahua Lin
- IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021, Oral
- IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021
- [Paper] [Code] [TPAMI version] [Bibtex]
- Reconfigurable Voxels: A New Representation for LiDAR-Based
Point Clouds - Tai Wang, Xinge Zhu, Dahua Lin
- Conference on Robot Learning (CoRL) 2020
- [Paper] [Spotlight Talk]
- Efficient Annotation of LiDAR Point Clouds
- FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-based
Point Clouds - Tai Wang, Conghui He, Zhe Wang, Jianping Shi, Dahua Lin
- ACM Symposium on User Interface Software and Technology (UIST) 2020, Poster
- [Full Tech Report] [Poster] [Poster Summary] [Demo]
Research Projects
- MMDetection3D: The Next-Generation Platform for General 3D Detection
- A versatile, open-source 3D object detection toolbox based on PyTorch
- MMDetection3D Contributors
- May 2020 – Now
- [Code] [Doc] [Bibtex]
Selected Awards
- Runner-up of Waymo Camera-Only 3D Detection Challenge, CVPR 2022
- Best Paper Award of Workshop on 3D Object Detection from Images, ICCV 2021
- 1st place of vision-only track and best PKL award of overall track, NuScenes 3D Detection Challenge, NeurIPS 2020
- Runner-up of NuScenes LiDAR Segmentation Challenge, NeurIPS 2020
- Gold Medal of Kaggle Competition (Top 1% of Lyft 3D Detection Challenge), NeurIPS 2019
- Hong Kong PhD Fellowship (HKPFS), 2019
- Chu Kochen Scholarship (Highest scholarship at Zhejiang University), 2018
- Top 10 Students of ZJU (Highest honor for 5 undergraduates/graduates), 2018
- National Scholarship (1.5%), 2017-2018
- First Prize in Physics Competition for Undergraduate, 2017
Teaching
- Computer Vision (Undergraduate Course), Winter 2018 @ ZJU
- IERG2080: Introduction to Systems Programming, Fall 2020 @ CUHK
- IERG2470B/ESTR2308: Probability Models and Applications (Elite Students), Spring 2021 @ CUHK
Miscellaneous
Academic Services
I served as a reviewer for CVPR, ICCV, ECCV, CoRL, NeurIPS, ICLR, ICML, WACV, TPAMI, IJCV, TVCG.
Hobbies
Love: 🏀Basketball (I am a big fan of Stephen Curry and Tracy McGrady), 🎵music/🎤singing and good at 🖌️Chinese calligraphy (learned from MA Liangchen and MA Shanshuang).