Hasil untuk "Labor policy. Labor and the state"

Menampilkan 20 dari ~3875116 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar

JSON API
arXiv Open Access 2026
Look Inward to Explore Outward: Learning Temperature Policy from LLM Internal States via Hierarchical RL

Yixiao Zhou, Yang Li, Dongzhou Cheng et al.

Reinforcement Learning from Verifiable Rewards (RLVR) trains large language models (LLMs) from sampled trajectories, making decoding strategy a core component of learning rather than a purely inference-time choice. Sampling temperature directly controls the exploration--exploitation trade-off by modulating policy entropy, yet existing methods rely on static values or heuristic adaptations that are decoupled from task-level rewards. We propose Introspective LLM, a hierarchical reinforcement learning framework that learns to control sampling temperature during generation. At each decoding step, the model selects a temperature based on its hidden state and samples the next token from the resulting distribution. Temperature and token policies are jointly optimized from downstream rewards using a coordinate ascent scheme. Experiments on mathematical reasoning benchmarks show that learned temperature policies outperform fixed and heuristic baselines, while exhibiting interpretable exploration behaviors aligned with reasoning uncertainty.

en cs.LG, cs.AI
arXiv Open Access 2026
Action-Free Offline-to-Online RL via Discretised State Policies

Natinael Solomon Neggatu, Jeremie Houssineau, Giovanni Montana

Most existing offline RL methods presume the availability of action labels within the dataset, but in many practical scenarios, actions may be missing due to privacy, storage, or sensor limitations. We formalise the setting of action-free offline-to-online RL, where agents must learn from datasets consisting solely of $(s,r,s')$ tuples and later leverage this knowledge during online interaction. To address this challenge, we propose learning state policies that recommend desirable next-state transitions rather than actions. Our contributions are twofold. First, we introduce a simple yet novel state discretisation transformation and propose Offline State-Only DecQN (\algo), a value-based algorithm designed to pre-train state policies from action-free data. \algo{} integrates the transformation to scale efficiently to high-dimensional problems while avoiding instability and overfitting associated with continuous state prediction. Second, we propose a novel mechanism for guided online learning that leverages these pre-trained state policies to accelerate the learning of online agents. Together, these components establish a scalable and practical framework for leveraging action-free datasets to accelerate online RL. Empirical results across diverse benchmarks demonstrate that our approach improves convergence speed and asymptotic performance, while analyses reveal that discretisation and regularisation are critical to its effectiveness.

en stat.ML, cs.AI
DOAJ Open Access 2025
Qualité de l’emploi en République de Corée: amélioration ou détérioration?

Les auteurs cherchent à savoir si la qualité de l’emploi s’est améliorée en République de Corée avec l’essor économique du pays. Ils s’intéressent à sept dimensions de la qualité de l’emploi et observent qu’elles ont évolué dans des directions différentes sur la période 2006-2020. C’est l’indice «qualité du temps de travail» qui s’améliore le plus, du fait de la réduction de la durée légale du travail. On constate aussi un progrès en matière de «rémunération» et de «relations au travail», mais un recul pour les dimensions «perspectives», «compétences et autonomie», «intensité du travail» et «environnement de travail». Les auteurs se penchent aussi sur deux grands facteurs d’inégalités. L’écart entre diplômés et non-diplômés se comble dans six dimensions, et les hommes sont mieux lotis que les femmes dans trois dimensions. Puisqu’il est de plus en plus avéré que la qualité de l’emploi influe sur la santé et le bien-être, ces résultats remettent en question l’idée que croissance économique et progrès social vont de pair.

Labor systems, Labor market. Labor supply. Labor demand
arXiv Open Access 2025
Evolutionary Policy Optimization

Jianren Wang, Yifan Su, Abhinav Gupta et al.

On-policy reinforcement learning (RL) algorithms are widely used for their strong asymptotic performance and training stability, but they struggle to scale with larger batch sizes, as additional parallel environments yield redundant data due to limited policy-induced diversity. In contrast, Evolutionary Algorithms (EAs) scale naturally and encourage exploration via randomized population-based search, but are often sample-inefficient. We propose Evolutionary Policy Optimization (EPO), a hybrid algorithm that combines the scalability and diversity of EAs with the performance and stability of policy gradients. EPO maintains a population of agents conditioned on latent variables, shares actor-critic network parameters for coherence and memory efficiency, and aggregates diverse experiences into a master agent. Across tasks in dexterous manipulation, legged locomotion, and classic control, EPO outperforms state-of-the-art baselines in sample efficiency, asymptotic performance, and scalability.

en cs.LG, cs.AI
arXiv Open Access 2025
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning

Kalyan Cherukuri, Aarav Lala, Yash Yardi

We propose Q-Policy, a hybrid quantum-classical reinforcement learning (RL) framework that mathematically accelerates policy evaluation and optimization by exploiting quantum computing primitives. Q-Policy encodes value functions in quantum superposition, enabling simultaneous evaluation of multiple state-action pairs via amplitude encoding and quantum parallelism. We introduce a quantum-enhanced policy iteration algorithm with provable polynomial reductions in sample complexity for the evaluation step, under standard assumptions. To demonstrate the technical feasibility and theoretical soundness of our approach, we validate Q-Policy on classical emulations of small discrete control tasks. Due to current hardware and simulation limitations, our experiments focus on showcasing proof-of-concept behavior rather than large-scale empirical evaluation. Our results support the potential of Q-Policy as a theoretical foundation for scalable RL on future quantum devices, addressing RL scalability challenges beyond classical approaches.

en cs.LG, cs.AI
arXiv Open Access 2025
Canonical Policy: Learning Canonical 3D Representation for SE(3)-Equivariant Policy

Zhiyuan Zhang, Zhengtong Xu, Jai Nanda Lakamsani et al.

Visual Imitation learning has achieved remarkable progress in robotic manipulation, yet generalization to unseen objects, scene layouts, and camera viewpoints remains a key challenge. Recent advances address this by using 3D point clouds, which provide geometry-aware, appearance-invariant representations, and by incorporating equivariance into policy architectures to exploit spatial symmetries. However, existing equivariant approaches often lack interpretability and rigor due to unstructured integration of equivariant components. We introduce canonical policy, a principled framework for 3D equivariant imitation learning that unifies 3D point cloud observations under a canonical representation. We first establish a theory of 3D canonical representations, enabling equivariant observation-to-action mappings by grouping both seen and novel point clouds to a canonical representation. We then propose a flexible policy learning pipeline that leverages geometric symmetries from canonical representation and the expressiveness of modern generative models. We validate canonical policy on 12 diverse simulated tasks and 4 real-world manipulation tasks across 16 configurations, involving variations in object color, shape, camera viewpoint, and robot platform. Compared to state-of-the-art imitation learning policies, canonical policy achieves an average improvement of 18.0% in simulation and 39.7% in real-world experiments, demonstrating superior generalization capability and sample efficiency. For more details, please refer to the project website: https://zhangzhiyuanzhang.github.io/cp-website/.

en cs.RO
arXiv Open Access 2025
Learning Reward Machines from Partially Observed Policies

Mohamad Louai Shehab, Antoine Aspeel, Necmiye Ozay

Inverse reinforcement learning is the problem of inferring a reward function from an optimal policy or demonstrations by an expert. In this work, it is assumed that the reward is expressed as a reward machine whose transitions depend on atomic propositions associated with the state of a Markov Decision Process (MDP). Our goal is to identify the true reward machine using finite information. To this end, we first introduce the notion of a prefix tree policy which associates a distribution of actions to each state of the MDP and each attainable finite sequence of atomic propositions. Then, we characterize an equivalence class of reward machines that can be identified given the prefix tree policy. Finally, we propose a SAT-based algorithm that uses information extracted from the prefix tree policy to solve for a reward machine. It is proved that if the prefix tree policy is known up to a sufficient (but finite) depth, our algorithm recovers the exact reward machine up to the equivalence class. This sufficient depth is derived as a function of the number of MDP states and (an upper bound on) the number of states of the reward machine. These results are further extended to the case where we only have access to demonstrations from an optimal policy. Several examples, including discrete grid and block worlds, a continuous state-space robotic arm, and real data from experiments with mice, are used to demonstrate the effectiveness and generality of the approach.

en cs.LG, cs.FL
arXiv Open Access 2025
Incorporating AI incident reporting into telecommunications law and policy: Insights from India

Avinash Agarwal, Manisha J. Nene

The integration of artificial intelligence (AI) into telecommunications infrastructure introduces novel risks, such as algorithmic bias and unpredictable system behavior, that fall outside the scope of traditional cybersecurity and data protection frameworks. This paper introduces a precise definition and a detailed typology of telecommunications AI incidents, establishing them as a distinct category of risk that extends beyond conventional cybersecurity and data protection breaches. It argues for their recognition as a distinct regulatory concern. Using India as a case study for jurisdictions that lack a horizontal AI law, the paper analyzes the country's key digital regulations. The analysis reveals that India's existing legal instruments, including the Telecommunications Act, 2023, the CERT-In Rules, and the Digital Personal Data Protection Act, 2023, focus on cybersecurity and data breaches, creating a significant regulatory gap for AI-specific operational incidents, such as performance degradation and algorithmic bias. The paper also examines structural barriers to disclosure and the limitations of existing AI incident repositories. Based on these findings, the paper proposes targeted policy recommendations centered on integrating AI incident reporting into India's existing telecom governance. Key proposals include mandating reporting for high-risk AI failures, designating an existing government body as a nodal agency to manage incident data, and developing standardized reporting frameworks. These recommendations aim to enhance regulatory clarity and strengthen long-term resilience, offering a pragmatic and replicable blueprint for other nations seeking to govern AI risks within their existing sectoral frameworks.

en cs.CY, cs.AI
DOAJ Open Access 2024
EDUCAÇÃO E FORÇA DE TRABALHO EM UMA ECONOMIA PRIMÁRIO-EXPORTADORA: O PANORAMA DAS OCUPAÇÕES PARA EGRESSOS DO ENSINO MÉDIO DA MICRORREGIÃO DE CAPANEMA – PR

Luciano Edison da Silva

As inquietações que motivaram esta pesquisa remontam à trajetória discente desde a educação básica até a universitária, sempre ponderando a continuidade dos estudos versus a inserção no mercado de trabalho. Como docente, a mesma angústia é vivenciada diante da alta evasão de jovens na última etapa da educação básica. A partir de 2016, com as primeiras experiências de pesquisa no Instituto Federal de Rondônia, essa aflição adquiriu um contorno mais científico, investigando as motivações desse movimento centrífugo no ensino médio. 

Special aspects of education, Labor market. Labor supply. Labor demand
DOAJ Open Access 2024
A REFORMA DO ENSINO MÉDIO PAULISTA E O APARTHEID SOCIAL E EDUCACIONAL

Felipe Alencar

Parte-se do referencial gramsciano de escola unitária para análise da reforma do ensino médio na rede estadual de São Paulo, por meio do programa Inova Educação, componente de todos os itinerários formativos do Novo Ensino Médio. Com base em documentos, indicadores educacionais e da força de trabalho, discursos de agentes privados formuladores do programa e entrevistas com educadores discute-se que a reforma, no contexto de austeridade e informalidade do trabalho, institucionaliza o apartheid social e educacional ao destituir conhecimentos da formação escolar visando ao trabalho subalterno.  Palavra-chave: Reforma do ensino médio. Inova Educação. Trabalho e educação. Políticas educacionais. Rede estadual paulista.

Special aspects of education, Labor market. Labor supply. Labor demand
DOAJ Open Access 2024
TECNOLOGIA SOCIAL: DESAFIOS ÀS ORGANIZAÇÕES DE CATADORES DE MATERIAIS RECICLÁVEIS

Ana Paula Dalmás Rodrigues, Sandro Benedito Sguarezi, Douglas Alexandre de Campos Castrillon Junior

O artigo apresenta o aplicativo que está sendo construído junto às Organizações de Catadoras/es de Materiais Recicláveis (OCMR) do Alto Pantanal Mato-Grossense. O objetivo é analisar dados de campo, identificando os desafios pelo método da pesquisa-ação suportada pela técnica bibliográfica, descritiva, diagnóstico socioeconômico e entrevistas junto aos sujeitos da pesquisa. Espera-se que o aplicativo aprimore processos de comercialização direta entre as OCMR e as indústrias que adquirem os materiais recicláveis fortalecendo o poder de barganha das OCMR. Palavras-chave: Aplicativo; Associação; Cooperativa; Resíduos Sólidos.

Special aspects of education, Labor market. Labor supply. Labor demand
DOAJ Open Access 2024
Transformation of Migration Policy of the Republic of Ireland

Oleg V. Okhoshin

In 2023, Dublin experienced massive anti-immigrant riots amid a growing number of asylum seekers in the Republic of Ireland. These events clearly demonstrated that government was gradually losing control of the situation and was unable to fully ensure the resettlement and effective integration of migrants into the host community. The Republic of Ireland planned to maintain the competitiveness of its national economy by attracting highly skilled workers from new EU member states. For this purpose, migration legislation was revised, relaxing the rules for entry into the country. However, in the 2010s, the migration trend changed – the number of foreign nationals from developing countries, which constituted the market for cheap, low-skilled labor, increased sharply in Ireland. At the same time, the influx of refugees increased after the “Arab Spring” and the beginning of the Special Military Operation in Ukraine. Migration waves put additional pressure on the state budget, public services, urban infrastructure and social stability. In 2024, the Republic of Ireland joined the EU pact to jointly solve the common problem of European countries. The article examines the transformation of the state migration policy in its close relationship with the UK and the EU, as well as the dynamics of migration trends in the Republic of Ireland in the 21st century.

International relations
DOAJ Open Access 2024
New Directions of Labor Migration From Tajikistan: The Case of the UK

Sergey V. Ryazantsev, Abubakr Kh. Rakhmonov, Elena E. Pismennaya

Introduction. Tajikistan is one of the few countries in the world whose state budget is largely based on tax revenues from remittances from citizens working abroad. The deterioration of the economic situation in Russia has forced migrants to look for a new direction of labor migration. In particular, the Government of Tajikistan itself is interested in reorienting migrants to a new direction. In 2021, Tajikistan signed an employment agreement with the United Kingdom (UK) and the Republic of Korea to send seasonal migrants. Goals. The article aims to identify the prospects and trends in the development of labor emigration from Tajikistan to the UK, as well as the features and channels of seasonal migration from Tajikistan to the UK. Materials and methods. In this study, statistical and sociological research methods are used. The statistics includes numbers of seasonal migrants from Tajikistan for a variety of years. A content analysis of interviews with labor migrants to the UK has been conducted. The analytical method includes systematization of the materials of recruitment agencies’ websites and bilateral agreements of the Republic of Tajikistan with foreign countries. The data are provided by the Agency on Statistics under the President of the Republic of Tajikistan, the National Statistical Service of Great Britain, World Bank, International Organization for Migration (IOM). Currently, the Agency on Statistics under the President of the Republic of Tajikistan has information on the unemployment rate in the country, the economically active population, as well as the number of Tajik migrants until 2021, the World Bank provides data until 2021. Data on the number of seasonal migrants and the structure of labor migration from Tajikistan at the end of 2022 are published in annual or quarterly sections and posted on the official website of the National Statistical Service of the UK. Results. The UK, which needs cheap labor for agriculture, after BREXIT was forced to expand the geography of attracting labor. In addition to the traditional region of hiring seasonal workers — Eastern European countries (Poland, Baltic states, Bulgaria), labor resources began to be actively attracted from Central Asian countries, including Tajikistan. The main factor in attracting migrants from Tajikistan to the UK is the shortage of labor in the country’s labor market after leaving the EU. The main reasons for the reorientation of Tajik migrants from the Russian labor market to the UK are as follows: economic crises in Russia; tightening of migration policy towards Tajik migrants; the spread of English among the youth of Tajikistan, etc. Migration of low-skilled citizens of Tajikistan began in the second quarter of 2021. The main type of migration from Tajikistan to the UK are seasonal migration. The main channel of emigration from Tajikistan to the UK are public and private recruitment agencies. The main one from the bottom is the state institution — Center for Counseling and Pre-Departure Training of Migrant Workers.

History (General), Oriental languages and literatures
DOAJ Open Access 2024
CONSTRUINDO UM NOVO PACTO PARA A EDUCAÇÃO: O PAPEL DA TECNOCIÊNCIA SOLIDÁRIA

Renato Dagnino

Enfoca-se aqui aspectos socioeconômicos da policy e da politics relacionados à produção do conhecimento condicionados por um pacto, intermediado pelo Estado capitalista, entre as classes proprietária e trabalhadora. Adotando a perspectiva dessa última, se investiga as características que deve possuir um novo pacto “para além do capital”. Como usual na tradição crítica latino-americana, o texto trata, primeiro e exemplarmente, a maneira como aqueles aspectos se manifestam nos países centrais. Por estar ancorado na experiência histórica e nos anseios dos atores sociais subalternos e orientado para a sua consecução na periferia do capitalismo, ele aponta caminho para a constituição de um novo pacto tendo como referência os valores e interesses da economia solidária. Na sua segunda parte, o texto apresenta o papel que pode desempenhar a Tecnociência Solidária, para pavimentar esse caminho. Palavras-chave: Pacto pela educação; América Latina, Política Cognitiva, Tecnociência Solidária 

Special aspects of education, Labor market. Labor supply. Labor demand
arXiv Open Access 2024
Frustrations in the ground state of a dilute Ising chain in a magnetic field

Yury Panov

The properties of the ground state of one of the simplest models of frustrated magnetic systems, a dilute Ising chain in a magnetic field, are considered for all values of the concentration of charged non-magnetic impurities. An analytical method is proposed for calculating the residual entropy of frustrated states, including states at the boundaries between the phases of the ground state, which is based on the Markov property of the system under consideration and allows direct generalization to other one-dimensional spin models with Ising-type interactions. The properties of local distributions and concentration dependences of the composition, correlation functions, magnetization and entropy of the phases of the ground state of the model are investigated. It is shown that the field-induced transition from the antiferromagnetic ground state to the frustrated one is accompanied by charge ordering and the absence of pseudo-transitions in the dilute Ising chain is proved.

en cond-mat.stat-mech
arXiv Open Access 2024
Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles

Jiesong Lian, Yucong Huang, Chengdong Ma et al.

For solving zero-sum games involving non-transitivity, a useful approach is to maintain a policy population to approximate the Nash Equilibrium (NE). Previous studies have shown that the Policy Space Response Oracles (PSRO) algorithm is an effective framework for solving such games. However, current methods initialize a new policy from scratch or inherit a single historical policy in Best Response (BR), missing the opportunity to leverage past policies to generate a better BR. In this paper, we propose Fusion-PSRO, which employs Nash Policy Fusion to initialize a new policy for BR training. Nash Policy Fusion serves as an implicit guiding policy that starts exploration on the current Meta-NE, thus providing a closer approximation to BR. Moreover, it insightfully captures a weighted moving average of past policies, dynamically adjusting these weights based on the Meta-NE in each iteration. This cumulative process further enhances the policy population. Empirical results on classic benchmarks show that Fusion-PSRO achieves lower exploitability, thereby mitigating the shortcomings of previous research on policy initialization in BR.

en cs.GT, cs.AI

Halaman 44 dari 193756