MFSM: Chinese-English sentence alignment based on multi- feature self-attention mechanism fusion
Abstrak
Bilingual parallel corpora is a very important basic resource in the research field of natural language processing based on statistics. There are cross alignment and empty alignment in Chinese-English bilingual text, it is easy to affect the effect of Chinese-English sentence alignment. Therefore, we propose a novel Chinese-English sentence alignment method based on multi-feature self-attention mechanism fusion. First, the long features of Chinese-English bilingual sentences are integrated into the Glove word vector. Then bidirectional gated recurrent unit is used to encode the feature word vector to obtain more fine-grained sentence local information. Second, the interactive attention mechanism is introduced to extract global information in bilingual sentences to ensure the effective use of contextual semantic features. Finally, the Kuhn-Munkres (KM) algorithm is introduced on the basis of multi-layer perceptron, which can deal with non-monotonic aligned text and improve the generalization ability of the model. Experiments show that, the F index with the proposed method exceeds 90%, the proposed method can effectively improve the correct rate and recall rate of sentence alignment, and improve the construction efficiency of Chinese-English parallel corpora.
Topik & Kata Kunci
Penulis (1)
Baolong Li
Akses Cepat
PDF tidak tersedia langsung
Cek di sumber asli →- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.6180/jase.202510_28(10).0005
- Akses
- Open Access ✓