Gaze Point Estimation via Joint Learning of Facial Features and Screen Projection
Abstrak
In recent years, gaze estimation has received a lot of interest in areas including human–computer interface, virtual reality, and user engagement analysis. Despite significant advances in convolutional neural network (CNN) techniques, directly and effectively predicting the point of gaze (PoG) in unconstrained situations remains a difficult task. This study proposes a gaze point estimation network (L1fcs-Net) that combines facial features with positional features derived from a two-dimensional array obtained by projecting the face relative to the screen. Our approach incorporates a Face-grid branch to enhance the network’s ability to extract features such as the relative position and distance of the face to the screen. Additionally, independent fully connected layers regress x and y coordinates separately, enabling the model to better capture gaze movement characteristics in both horizontal and vertical directions. Furthermore, we employ a multi-loss approach, balancing classification and regression losses to reduce gaze point prediction errors and improve overall gaze performance. To evaluate our model, we conducted experiments on the MPIIFaceGaz dataset, which was collected under unconstrained settings. The proposed model achieves state-of-the-art performance on this dataset with a gaze point prediction error of 2.05 cm, demonstrating its superior capability in gaze estimation.
Topik & Kata Kunci
Penulis (3)
Yuying Zhang
Fei Xu
Yi Yang
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.3390/app152312475
- Akses
- Open Access ✓