Hezhen Hu

I am a postdoctoral fellow in VITA group in University of Texas at Austin (UT Austin), working with Prof. Zhangyang Wang and Prof. Georgios Pavlakos. My research aims to build human-centered AI systems that treat human communication and interaction as first-class capabilities, grounded in embodiment and real-world context, enabling physically and socially meaningful interactions between people and AI. This vision has translated into sustained real-world deployments for Deaf communities. In parallel, I have deeply engaged in education and outreach.

I got my PhD degree from University of Science and Technology of China (USTC), supervised by Prof. Wengang Zhou and Prof. Houqiang Li. I served as a leader in the VSLRG research team of sign language understanding. From May 2022 to Feburary 2023, I am fortune to work as a research intern at Microsfot Research Asia, supervised by Jianmin Bao, Dongdong Chen, Lu Yuan, Dong Chen, and Fang Wen.

Please feel free to contact me if you are interested in the above topics and want to collaborate with me.

Email / Google Scholar

News

2026.02: 2 x main papers (HumanNOVA, VLM-3R) + 2 x findings (WildAni4D, EgoTL) accepted by CVPR 2026.
2026.01: We are organizing the 1st GenSign workshop at CVPR 2026.
2025.10: We are organizing the 3rd AI3DCC workshop at ICCV 2025.

Selected Publications

Representative papers are highlighted. For the full list, please refer to Google Scholar.

	HumanNOVA: Photorealistic, Universal and Rapid 3D Human Avatar Modeling from a Single Image Hezhen Hu, Wangbo Zhao, Lanqing Guo, Hanwen Jiang, Jonathan C. Liu, Zhiwen Fan, Kai Wang, Zhangyang Wang, Georgios Pavlakos, Conference on Computer Vision and Pattern Recognition (CVPR), 2026 Project Photorealistic, universal and rapid 3D human avatar modeling from a single image.
	Expressive Gaussian Human Avatars from Monocular RGB Video Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang, Conference on Neural Information Processing Systems (NeurIPS), 2024 Project Expressive animatable avatar (hand & face), learn from in-the-wild monocular RGB video (no SMPL-X annotation required).
	SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding Hezhen Hu, Weichao Zhao, Wengang Zhou, Houqiang Li, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023 Project [Extension of SignBERT] The first self-supervised pre-training framework in sign language understanding.

	MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding Renjie Li, Ruijie Ye, Mingyang Wu, Hao (Frank) Yang, Zhiwen Fan, Hezhen Hu*, Zhengzhong Tu, Preprint A scalable data annotation pipeline for context-aware human understanding.
	VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction Zhiwen Fan, Jian Zhang, Renjie Li, Junge Zhang, Runjin Chen, Hezhen Hu, ... Zhangyang Wang, Rakesh Ranjan, Conference on Computer Vision and Pattern Recognition (CVPR), 2026 Project A VLM framework that couples 3D reconstructive instruction tuning with scalable training data curation for 3D reasoning.
	Uni-Sign: Toward Unified Sign Language Understanding at Scale Zecheng Li, Wengang Zhou, Weichao Zhao, Kepeng Wu, Hezhen Hu, Houqiang Li, International Conference on Learning Representations (ICLR), 2025 We propose a large-scale generative pre-training strategy, that eliminates the gap between pre-training and downstream SLU tasks.
	Prior-aware Cross Modality Augmentation Learning for Continuous Sign Language Recognition Hezhen Hu, Junfu Pu, Wengang Zhou, Hang Fang, Houqiang Li, IEEE Transactions on Multimedia (TMM), 2023 We propose a novel cross-modality augmentation learning paradigm with prior incorporated for continuous SLR.
	Hand-Object Interaction Image Generation Hezhen Hu, Weilun Wang, Wengang Zhou, Houqiang Li, Conference on Neural Information Processing Systems (NeurIPS), 2022 Project We present a new task, i.e., hand-object interaction image generation. This task is challenging and research-worthy in many potential application scenarios, such as online shopping.
	Collaborative Multilingual Sign Language Recognition: A Unified Framework Hezhen Hu, Junfu Pu, Wengang Zhou, Houqiang Li, IEEE Transactions on Multimedia (TMM), 2022 We are the first to explore the multilingual topic in continuous SLR and propose a unified framework targeting this problem.
	SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition Hezhen Hu, Weichao Zhao, Wengang Zhou, Houqiang Li, IEEE International Conference on Computer Vision (ICCV), 2021 The first self-supervised pre-training framework in isolated SLR. It conducts masked modeling with hand-prior incorporated for better capturing context in the sign language domain.
	Model-Aware Gesture-to-Gesture Translation Hezhen Hu, Weilun Wang, Wengang Zhou, Weichao Zhao, Houqiang Li, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021 The first model-aware framework in gesture-to-gesture translation.
	Hand-Model-Aware Sign Language Recognition Hezhen Hu, Wengang Zhou, Houqiang Li, AAAI Conference on Artificial Intelligence (AAAI), 2021 The first model-aware framework in isolated SLR.
	Global-local Enhancement Network for NMFs-aware Sign Language Recognition Hezhen Hu, Wengang Zhou, Junfu Pu, Houqiang Li, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2021 The first dataset explicitly exploring the importance of non-manual features in sign language.

Impact & Outreach

Deaf-facing deployment: Our sign-language foundation models support real-world interactions in public services and robotics.

K–12 education: Co-authored an AI textbook series adopted nationwide, distributed across 500+ schools and reaching 300,000+ students.

Academic Services

I serve as Area Chair (AC) for International Conference on Acoustics, Speech, and Signal Processing (ICASSP) and have reviewed the following Journals and Conferences:

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
International Conference on Computer Vision (ICCV)
European Conference on Computer Vision (ECCV)
Conference on Neural Information Processing Systems (NeurIPS)
International Conference on Learning Representations (ICLR)
International Conference on Machine Learning (ICML)
AAAI Conference on Artificial Intelligence (AAAI)
International Joint Conference on Artificial Intelligence (IJCAI)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
International Journal of Computer Vision (IJCV)
IEEE Transactions on Multimedia (TMM)
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

I am happy to serve as a reviewer. Please feel free to contact me.

Awards

MSRA "Stars of Tomorrow" Award, 2023
University Excellent Doctoral Dissertation, 2023
Outstanding Graduate Student, 2023
ECCV Looking at People CSLR Challenge, Ranked 1st, 2022
National Scholarship, 2021, 2017, 2015

Last Update: January 21th, 2025
Template borrowed from Jon Barron. Thanks!