Yantao Lai (赖彦涛)

I am a final year master's student in the Computer Science and Technology College at Nanjing University of Aeronautics and Astronautics. During my master's studies, my thesis focuses on using vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for modeling human visual attention (eye gaze). Meanwhile, I have a strong interest in multiple research directions including Autonomous Driving, AIGC, and VLMs (Vision-Language Models). I am fortunate to be advised by Rong Quan (dissertation advisor), Dong Liang, Wentong Li, and Jie Qin.

Previously, I was an undergraduate student at the School of Internet at Anhui University, majoring in Intelligent Science and Technology, where I primarily studied foundational theories related to computer science and artificial intelligence.

Résumé  /  Email  /  WeChat

profile photo
News
  • I am actively looking for full-time industry applied scientist / research scientist opportunities. Please contact me via email or WeChat if you have any leads.
  • [Oct 2025] I was awarded the National Scholarship for Postgraduates!
  • [Aug 2025] A patent related to Goal-Directed scanpath prediction has been granted!
  • [Aug 2025] One paper focused on Object Referring-Guided Scanpath Prediction has been submitted to AAAI 2026.
  • [May 2025] I have joined the Baidu Apollo, Beijing as a Autonomous Driving Perception Model Algorithm Intern!
  • [May 2025] One paper focused on Goal-Directed scanpath prediction has been accepted to ICASSP 2025 (Oral)!
  • [Apr 2025] A patent related to gaze prediction for panoramic images has been granted!
  • [Dec 2024] I have joined the Xiaomi, Beijing as a Research Scientist Intern focusing on AIGC and large models!
  • [Oct 2024] One paper focused on gaze prediction for panoramic images has been accepted to ECCV 2024!
Research

I am broadly interested in Computer Vision, Autonomous Driving and Multimodal AI (AIGC, Vision-Language models). During my master's studies, my research primarily focuses on leveraging vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for modeling human visual attention (eye gaze). For more details, refer to my résumé.

Pathformer3D: A 3D Scanpath Transformer for 360° Images
Rong Quan*, Yantao Lai*, Mengyu Qiu, Dong Liang
ECCV, 2024
PDF / Bibtex / Code

一种面向全景图像的人眼扫视轨迹预测方法
Rong Quan, Yantao Lai, Dong Liang, Mengyu Qiu, Jie Qin,
Patent, Granted
PDF / Bibtex / Code

CLIPGaze: Zero-Shot Goal-Directed Scanpath Prediction Using CLIP
Yantao Lai, Rong Quan, Dong Liang, Jie Qin,
ICASSP oral, 2025
PDF / Bibtex / Code

一种目标导向的扫视路径预测方法
Rong Quan, Yantao Lai, Dong Liang, Jie Qin,
Patent, Disclosed
PDF / Bibtex / Code


Webpage template from Jon Barron