arXiv Open Access 2025

Inferring Reward Machines and Transition Machines from Partially Observable Markov Decision Processes

Yuly Wu Jiamou Liu Libo Zhang

Lihat Sumber

Abstrak

Partially Observable Markov Decision Processes (POMDPs) are fundamental to many real-world applications. Although reinforcement learning (RL) has shown success in fully observable domains, learning policies from traces in partially observable environments remains challenging due to non-Markovian observations. Inferring an automaton to handle the non-Markovianity is a proven effective approach, but faces two limitations: 1) existing automaton representations focus only on reward-based non-Markovianity, leading to unnatural problem formulations; 2) inference algorithms face enormous computational costs. For the first limitation, we introduce Transition Machines (TMs) to complement existing Reward Machines (RMs). To develop a unified inference algorithm for both automata types, we propose the Dual Behavior Mealy Machine (DBMM) that subsumes both TMs and RMs. We then introduce DB-RPNI, a passive automata learning algorithm that efficiently infers DBMMs while avoiding the costly reductions required by prior work. We further develop optimization techniques and identify sufficient conditions for inferring the minimal correct automata. Experimentally, our inference method achieves speedups of up to three orders of magnitude over SOTA baselines.

Topik & Kata Kunci

cs.LG cs.AI

Penulis (3)

Yuly Wu

Jiamou Liu

Libo Zhang

Format Sitasi

APA MLA BibTeX

Wu, Y., Liu, J., Zhang, L. (2025). Inferring Reward Machines and Transition Machines from Partially Observable Markov Decision Processes. https://arxiv.org/abs/2508.01947

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓