arXiv Open Access 2025

SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance

Pengkun Jiao Yiming Jin Jianhui Yang Chenhe Dong Zerui Huang +4 lainnya

Lihat Sumber

Abstrak

Query-product relevance prediction is vital for AI-driven e-commerce, yet current LLM-based approaches face a dilemma: SFT and DPO struggle with long-tail generalization due to coarse supervision, while traditional RLVR suffers from sparse feedback that fails to correct intermediate reasoning errors. We propose Stepwise Hybrid Examination (SHE), an RL framework that ensures logical consistency through Stepwise Reward Policy Optimization (SRPO). SRPO utilizes a hybrid reward mechanism-combining generative reward models with human-annotated verifiers-to provide fine-grained, step-level signals. To further enhance stability, SHE incorporates diversified data filtering to maintain policy entropy and a multi-stage curriculum learning protocol for progressive skill acquisition. Extensive experiments on real-world search benchmarks show that SHE improves both reasoning quality and relevance-prediction accuracy in large-scale e-commerce settings, outperforming SFT, DPO, GRPO, and other baselines, while also enhancing interpretability and robustness.

Topik & Kata Kunci

cs.AI

Penulis (9)

Pengkun Jiao

Yiming Jin

Jianhui Yang

Chenhe Dong

Zerui Huang

Shaowei Yao

Xiaojiang Zhou

Dan Ou

Haihong Tang

Format Sitasi

APA MLA BibTeX

Jiao, P., Jin, Y., Yang, J., Dong, C., Huang, Z., Yao, S. et al. (2025). SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance. https://arxiv.org/abs/2510.07972

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓