IGSMNet: Ingredient-Guided Semantic Modeling Network for Food Nutrition Estimation
Abstrak
In recent years, food nutrition estimation has received growing attention due to its critical role in dietary analysis and public health. Traditional nutrition assessment methods often rely on manual measurements and expert knowledge, which are time-consuming and not easily scalable. With the advancement of computer vision, RGB-based methods have been proposed, and more recently, RGB-D-based approaches have further improved performance by incorporating depth information to capture spatial cues. While these methods have shown promising results, they still face challenges in complex food scenes, such as limited ability to distinguish visually similar items with different ingredients and insufficient modeling of spatial or semantic relationships. To solve these issues, we propose an Ingredient-Guided Semantic Modeling Network (IGSMNet) for food nutrition estimation. The method introduces an ingredient-guided module that encodes ingredient information using a pre-trained language model and aligns it with visual features via cross-modal attention. At the same time, an internal semantic modeling component is designed to enhance structural understanding through dynamic positional encoding and localized attention, allowing for fine-grained relational reasoning. On the Nutrition5k dataset, our method achieves PMAE values of 12.2% for Calories, 9.4% for Mass, 19.1% for Fat, 18.3% for Carb, and 16.0% for Protein. These results demonstrate that our IGSMNet consistently outperforms existing baselines, validating its effectiveness.
Penulis (5)
Donglin Zhang
Weixiang Shi
Boyuan Ma
Weiqing Min
Xiao-Jun Wu
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- CrossRef
- DOI
- 10.3390/foods14213697
- Akses
- Open Access ✓