|
Yantao Lai (赖彦涛)
I am a final year master's student in the Computer Science and Technology College at Nanjing University of Aeronautics and Astronautics.
During my master's studies, my thesis focuses on using vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for modeling human visual attention (eye gaze).
Meanwhile, I have a strong interest in multiple research directions including Autonomous Driving, AIGC, and VLMs (Vision-Language Models).
I am fortunate to be advised by Rong Quan (dissertation advisor), Dong Liang, Wentong Li, and Jie Qin.
Previously, I was an undergraduate student at the School of Internet at Anhui University, majoring in Intelligent Science and Technology, where I primarily studied foundational theories related to computer science and artificial intelligence.
Résumé  / 
Email  / 
WeChat
|
|
News
- I am actively looking for full-time industry applied scientist / research scientist opportunities. Please contact me via email or WeChat if you have any leads.
- [Oct 2025] I was awarded the National Scholarship for Postgraduates!
- [Aug 2025] A patent related to Goal-Directed scanpath prediction has been granted!
- [Aug 2025] One paper focused on Object Referring-Guided Scanpath Prediction has been submitted to AAAI 2026.
- [May 2025] I have joined the Baidu Apollo, Beijing as a Autonomous Driving Perception Model Algorithm Intern!
- [May 2025] One paper focused on Goal-Directed scanpath prediction has been accepted to ICASSP 2025 (Oral)!
- [Apr 2025] A patent related to gaze prediction for panoramic images has been granted!
- [Dec 2024] I have joined the Xiaomi, Beijing as a Research Scientist Intern focusing on AIGC and large models!
- [Oct 2024] One paper focused on gaze prediction for panoramic images has been accepted to ECCV 2024!
|
|
Research
I am broadly interested in Computer Vision, Autonomous Driving and Multimodal AI (AIGC, Vision-Language models). During my master's studies, my research primarily focuses on leveraging vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for modeling human visual attention (eye gaze). For more details, refer to my résumé.
|
|
Pathformer3D: A 3D Scanpath Transformer for 360° Images
Rong Quan*,
Yantao Lai*,
Mengyu Qiu,
Dong Liang†
ECCV, 2024
PDF
/
Bibtex
/
Code
|
|
一种面向全景图像的人眼扫视轨迹预测方法
Rong Quan,
Yantao Lai,
Dong Liang,
Mengyu Qiu,
Jie Qin,
Patent, Granted
PDF
/
Bibtex
/
Code
|
|
CLIPGaze: Zero-Shot Goal-Directed Scanpath Prediction Using CLIP
Yantao Lai,
Rong Quan,
Dong Liang†,
Jie Qin,
ICASSP oral, 2025
PDF
/
Bibtex
/
Code
|
|
一种目标导向的扫视路径预测方法
Rong Quan,
Yantao Lai,
Dong Liang,
Jie Qin,
Patent, Disclosed
PDF
/
Bibtex
/
Code
|
|