DOAJ Open Access 2025

MFSM: Chinese-English sentence alignment based on multi- feature self-attention mechanism fusion

Baolong Li

Abstrak

Bilingual parallel corpora is a very important basic resource in the research field of natural language processing based on statistics. There are cross alignment and empty alignment in Chinese-English bilingual text, it is easy to affect the effect of Chinese-English sentence alignment. Therefore, we propose a novel Chinese-English sentence alignment method based on multi-feature self-attention mechanism fusion. First, the long features of Chinese-English bilingual sentences are integrated into the Glove word vector. Then bidirectional gated recurrent unit is used to encode the feature word vector to obtain more fine-grained sentence local information. Second, the interactive attention mechanism is introduced to extract global information in bilingual sentences to ensure the effective use of contextual semantic features. Finally, the Kuhn-Munkres (KM) algorithm is introduced on the basis of multi-layer perceptron, which can deal with non-monotonic aligned text and improve the generalization ability of the model. Experiments show that, the F index with the proposed method exceeds 90%, the proposed method can effectively improve the correct rate and recall rate of sentence alignment, and improve the construction efficiency of Chinese-English parallel corpora.

Penulis (1)

B

Baolong Li

Format Sitasi

Li, B. (2025). MFSM: Chinese-English sentence alignment based on multi- feature self-attention mechanism fusion. https://doi.org/10.6180/jase.202510_28(10).0005

Akses Cepat

Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.6180/jase.202510_28(10).0005
Akses
Open Access ✓