Yantao Lai

Yantao Lai (赖彦涛)

I am a final year master's student in the Computer Science and Technology College at Nanjing University of Aeronautics and Astronautics. During my master's studies, my thesis focuses on using vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for modeling human visual attention (eye gaze). Meanwhile, I have a strong interest in multiple research directions including Autonomous Driving, AIGC, and VLMs (Vision-Language Models). I am fortunate to be advised by Rong Quan (dissertation advisor), Dong Liang, Wentong Li, and Jie Qin.

Previously, I was an undergraduate student at the School of Internet at Anhui University, majoring in Intelligent Science and Technology, where I primarily studied foundational theories related to computer science and artificial intelligence.

Résumé / Email / WeChat

News

I am actively looking for full-time industry applied scientist / research scientist opportunities. Please contact me via email or WeChat if you have any leads.
[Oct 2025] I was awarded the National Scholarship for Postgraduates!
[Aug 2025] A patent related to Goal-Directed scanpath prediction has been granted!
[Aug 2025] One paper focused on Object Referring-Guided Scanpath Prediction has been submitted to AAAI 2026.
[May 2025] I have joined the Baidu Apollo, Beijing as a Autonomous Driving Perception Model Algorithm Intern!
[May 2025] One paper focused on Goal-Directed scanpath prediction has been accepted to ICASSP 2025 (Oral)!
[Apr 2025] A patent related to gaze prediction for panoramic images has been granted!
[Dec 2024] I have joined the Xiaomi, Beijing as a Research Scientist Intern focusing on AIGC and large models!
[Oct 2024] One paper focused on gaze prediction for panoramic images has been accepted to ECCV 2024!

Research

I am broadly interested in Computer Vision, Autonomous Driving and Multimodal AI (AIGC, Vision-Language models). During my master's studies, my research primarily focuses on leveraging vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for modeling human visual attention (eye gaze). For more details, refer to my résumé.

	Pathformer3D: A 3D Scanpath Transformer for 360° Images Rong Quan, Yantao Lai, Mengyu Qiu, Dong Liang^† ECCV, 2024 PDF / Bibtex / Code
	一种面向全景图像的人眼扫视轨迹预测方法 Rong Quan, Yantao Lai, Dong Liang, Mengyu Qiu, Jie Qin, Patent, Granted PDF / Bibtex / Code
	CLIPGaze: Zero-Shot Goal-Directed Scanpath Prediction Using CLIP Yantao Lai, Rong Quan, Dong Liang^†, Jie Qin, ICASSP oral, 2025 PDF / Bibtex / Code
	一种目标导向的扫视路径预测方法 Rong Quan, Yantao Lai, Dong Liang, Jie Qin, Patent, Disclosed PDF / Bibtex / Code

Webpage template from Jon Barron