Tibetan Medical Named Entity Recognition Based on Syllable‐Word‐Sentence Embedding Transformer
Abstrak
ABSTRACT Tibetan medical named entity recognition (Tibetan MNER) involves extracting specific types of medical entities from unstructured Tibetan medical texts. Tibetan MNER provide important data support for the work related to Tibetan medicine. However, existing Tibetan MNER methods often struggle to comprehensively capture multi‐level semantic information, failing to sufficiently extract multi‐granularity features and effectively filter out irrelevant information, which ultimately impacts the accuracy of entity recognition. This paper proposes an improved embedding representation method called syllable–word–sentence embedding. By leveraging features at different granularities and using un‐scaled dot‐product attention to focus on key features for feature fusion, the syllable–word–sentence embedding is integrated into the transformer, enhancing the specificity and diversity of feature representations. The model leverages multi‐level and multi‐granularity semantic information, thereby improving the performance of Tibetan MNER. We evaluate our proposed model on datasets from various domains. The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed, achieving an F1 score of 93.59%, which represents an improvement of 1.24% compared to the vanilla FLAT. Additionally, results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities, with an F1 score of 71.39%, which is a 1.34% improvement over the vanilla FLAT.
Topik & Kata Kunci
Penulis (10)
Jin Zhang
Ziyue Zhang
Lobsang Yeshi
Dorje Tashi
Xiangshi Wang
Yuqing Cai
Yongbin Yu
Xiangxiang Wang
Nyima Tashi
Gadeng Luosang
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.1049/cit2.70029
- Akses
- Open Access ✓