Hasil "Labor policy. Labor and the state"

S2 Open Access 1934

The Theory of Economic Development: An Inquiry into Profits, Capital, Credit, Interest, and the Business Cycle

J. Schumpeter

10174 sitasi en Economics

Detail Sumber

S2 Open Access 1969

A Model for Labor Migration and Urban Unemployment in Less Developed Countries

M. Todaro

3231 sitasi en Economics

Detail Sumber

CrossRef Open Access 2026

Agricultural Exceptionalism: Development of a Labor Law Equity Index to Capture Variation in State Labor Protections for U.S. Agricultural Workers

Erica Chavez Santos, India J. Ornelas, Heather D. Hill et al.

Agricultural workers are often excluded from labor laws; while some states have expanded labor protections and health and safety rules for agricultural workers, many states have not. We developed an agricultural worker labor law equity index (LLEI) using legal epidemiology methods. The LLEI evaluates state laws related to 3 labor protection topics—workers’ compensation, minimum wage, and overtime pay—across 39 states from 2001 to 2017. For each topic, we scored states according to the extent to which they afforded protections to agricultural workers, where higher LLEI scores indicate more inclusive labor protections for agricultural workers. Most states had positive scores ( N = 23, 59%), 8 states (20.5%) scored 0, and 8 states (20.5%) had negative scores. This study provides a greater understanding of the variation in labor protections for agricultural workers across 39 U.S. states. Future studies can use the LLEI to examine the impact of labor laws on agricultural workers’ health.

en

Detail DOI Sumber

arXiv Open Access 2025

Learning General Policies with Policy Gradient Methods

Simon Ståhlberg, Blai Bonet, Hector Geffner

While reinforcement learning methods have delivered remarkable results in a number of settings, generalization, i.e., the ability to produce policies that generalize in a reliable and systematic way, has remained a challenge. The problem of generalization has been addressed formally in classical planning where provable correct policies that generalize over all instances of a given domain have been learned using combinatorial methods. The aim of this work is to bring these two research threads together to illuminate the conditions under which (deep) reinforcement learning approaches, and in particular, policy optimization methods, can be used to learn policies that generalize like combinatorial methods do. We draw on lessons learned from previous combinatorial and deep learning approaches, and extend them in a convenient way. From the former, we model policies as state transition classifiers, as (ground) actions are not general and change from instance to instance. From the latter, we use graph neural networks (GNNs) adapted to deal with relational structures for representing value functions over planning states, and in our case, policies. With these ingredients in place, we find that actor-critic methods can be used to learn policies that generalize almost as well as those obtained using combinatorial approaches while avoiding the scalability bottleneck and the use of feature pools. Moreover, the limitations of the DRL methods on the benchmarks considered have little to do with deep learning or reinforcement learning algorithms, and result from the well-understood expressive limitations of GNNs, and the tradeoff between optimality and generalization (general policies cannot be optimal in some domains). Both of these limitations are addressed without changing the basic DRL methods by adding derived predicates and an alternative cost structure to optimize.

en cs.AI, cs.LG

Detail Sumber

DOAJ Open Access 2025

Reproductive politics and women's empowerment; how does geopolitics control women?

Gayathri Delanerolle, Gayathri Delanerolle, Sohier Elneil et al.

Reproductive politics lie at the intersection of gender, power, and governance, shaping women's autonomy through laws, policies, and cultural norms. Historically, colonialism and population control initiatives marginalized women, particularly in the Global South, fostering distrust in healthcare systems. Feminist movements advocate for reproductive justice, yet economic and nationalistic interests continue to influence access to care. Governments regulate reproduction to control demographics, labor markets, and national power. Pronatalist and antinatalist policies, such as China's One-Child Policy, have led to coercive interventions, disproportionately affecting marginalized communities. Reproductive politics also shape masculinity, fatherhood, and state-controlled family structures. Global reproductive policies reflect ideological struggles, from restrictive abortion laws in Poland and the U.S. to progressive approaches in Nepal and Vietnam. Socioeconomic barriers further limit access to contraception, maternal healthcare, and fertility treatments. Achieving reproductive justice requires inclusive policies, healthcare reform, and recognition of reproductive rights as fundamental to gender equality.

Gynecology and obstetrics, Women. Feminism

Detail DOI Sumber

DOAJ Open Access 2025

Strategies for Employment Policy Management and Labor Market Development in Mangistau Oblast

Esturlieva A.I., Kizimbayeva A., Bekbergenova Ж.T. et al.

This study examines the management of employment policy in Mangystau Oblast, focusing on monitoring strategies and the effectiveness of government initiatives. The research aims to identify reliable mechanisms for improving employment policies, reducing unemployment, and optimizing labor market conditions. The findings highlight the positive impact of state programs, such as the “Labor Program” (2021–2023), which contributed to an increase in employment and a decline in unemployment rates. Key results indicate that vocational training, entrepreneurship support, and job creation initiatives have enhanced workforce participation. However, significant gaps remain, including a lack of sustainable employment solutions, limited diversification of economic sectors, and the high prevalence of informal labor. These challenges underscore the need for a more integrated employment strategy, combining state support with private sector engagement. The study’s implications emphasize the necessity for continuous monitoring and policy adjustments to align labor market dynamics with economic development goals. Future research should focus on the long-term sustainability of employment programs and the adaptation of workforce policies to evolving labor market demands.

Social Sciences

Detail DOI Sumber

S2 Open Access 2025

INVESTMENT ASPECTS OF UPDATING THE RESOURCE POTENTIAL OF AGRICULTURAL ENTERPRISES

N. Svynous

Introduction. The development of the agricultural sector in modern conditions is characterized by an increasing need to update the resource potential of agricultural enterprises. Physical and moral wear and tear of the material and technical base, degradation of land resources, shortage of qualified labor and limited access to financial resources hinder the increase in the efficiency of agricultural production. Methods. In the process of research, general scientific and special methods were used, in particular analysis and synthesis - to generalize theoretical approaches to investment activities and update the resource potential of agricultural enterprises. Economic-statistical and comparative methods were used to assess the state, dynamics and effectiveness of investment processes in agriculture. Systemic and structural-functional approaches were used to substantiate investment mechanisms for updating the resource potential of agricultural enterprises in the face of modern challenges. Results. It is substantiated that investment renewal of resource potential consists in the purposeful direction of financial resources for the reproduction, modernization and improvement of the qualitative characteristics of each component. The structure of investment processes in the agricultural sector of the economy is analyzed and key restraining factors of investment activity are identified, among which are the high level of production and financial risks inherent in agriculture, the financial structure of agricultural enterprises, limited access of agricultural enterprises to long-term lending. The need to improve institutional and financial mechanisms for stimulating investment activity is proven, in particular, the development of a system of state guarantees, insurance of investment risks and increasing the predictability of state investment policy. Discussion. Prospects for further research are related to the development of complex models of investment support for the renewal of the resource potential of agricultural enterprises, taking into account military, climatic and market risks. The current direction is to assess the effectiveness of investments in digital, resource-saving and environmentally friendly technologies from the standpoint of increasing the competitiveness of agricultural production. Further scientific research should be directed at studying the role of state support, financial instruments and public-private partnership in activating investment processes in the agricultural sector. Keywords: investment activity, resource potential, agricultural enterprises, resource renewal, investment mechanisms, state support, innovation, sustainable development.

en

Detail DOI Sumber

S2 Open Access 2025

Motivational Determinants of the Quality of Working Life of Youth in Ukraine: Economic Aspects of Personality Self-Actualization

Denys Kozlov

The modern competitiveness of the state increasingly depends not on natural resources or material wealth, but on the ability of society to form and effectively realize labor potentialIn the context of digitalization, global challenges and internal transformations, young people are faced with new requirements for professional self-realization, which actualizes the need to study the motivational factors of their labor activity. The purpose of the article is to deepen the theoretical and methodological principles for determining the determinants of the quality of working life of young people in Ukraine through the prism of their economic self-actualization. In the process of research, general scientific and special methods were used: analysis and synthesis, socio-economic modeling, statistical data processing and analytical grouping. As a result of the research, it was proposed to formalize the process of making economic decisions by young people regarding working life through the projection of Maslow's motivational model onto working life as an economic component of young people's activities; conceptualized the organizational and economic mechanism of self-actualization of youth in the sphere of labor relations; established the crucial importance of the vector “type of economic activity. In theoretical terms, the transition from social motivation processes to their economic projections provides the possibility of analytical assessment of economic and organizational factors of the quality of working life of Ukrainian youth. In practical terms, four typical models of sectoral use of human capital were identified, which allows forming a toolkit for the formation of an effective motivational policy focused creating conditions for self-realization, which is a key factor in ensuring sustainable socio-economic development of the state. Prospects for further research lie in studying the behavioral characteristics of labor market subjects for a deeper understanding of the logic of individual choices of young people and the mechanisms of differentiation of their economic behavior in the labor market. The article is empirical.

en

Detail DOI Sumber

arXiv Open Access 2024

Policy Bifurcation in Safe Reinforcement Learning

Wenjun Zou, Yao Lyu, Jie Li et al.

Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems. Existing studies in safe RL implicitly assume continuity in policy functions, where policies map states to actions in a smooth, uninterrupted manner; however, our research finds that in some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous local optima can inevitably lead to constraint violations. We are the first to identify the generating mechanism of such a phenomenon, and employ topological analysis to rigorously prove the existence of policy bifurcation in safe RL, which corresponds to the contractibility of the reachable tuple. Our theorem reveals that in scenarios where the obstacle-free state space is non-simply connected, a feasible policy is required to be bifurcated, meaning its output action needs to change abruptly in response to the varying state. To train such a bifurcated policy, we propose a safe RL algorithm called multimodal policy optimization (MUPO), which utilizes a Gaussian mixture distribution as the policy output. The bifurcated behavior can be achieved by selecting the Gaussian component with the highest mixing coefficient. Besides, MUPO also integrates spectral normalization and forward KL divergence to enhance the policy's capability of exploring different modes. Experiments with vehicle control tasks show that our algorithm successfully learns the bifurcated policy and ensures satisfying safety, while a continuous policy suffers from inevitable constraint violations.

en cs.LG

Detail Sumber

DOAJ Open Access 2024

Blockchain Technology and Social Policy Transformation: A Critical Examination and Recommendations

Nurten Ebru Özdemir, Gökçe Cerev, Doğa Başar Sarıipek et al.

In recent years, Blockchain technology has emerged as a transformative innovation with significant implications on both a global and local scale. Its potential to revolutionise social policy in Turkey is profound and offers avenues for the development of transparent, reliable and participatory systems. As the concept of work evolves, traditional social protection and security systems rooted in conventional employment models must adapt to this changing landscape. In an era prioritising citizenship ties, the importance of financial stability and efficiency has never been greater, necessitating the formulation of precise needs-based social protection policies. Through its ability to enhance the accuracy of identifying individuals in need, the blockchain holds the promise of optimising resource allocation and facilitating the effective implementation of needs-based social policies. This study seeks to delve into the transformative potential of Blockchain technology within social policy systems in Turkey, with a specific focus on social assistance practices, and to provide recommendations for its integration. The significance of this study lies in its exploration of Blockchain’s decentralised digital structure of the blockchain, which ensures data integrity and is pivotal in establishing fair and transparent social assistance mechanisms. By pioneering research in this area, this study aims to bridge existing gaps in the literature and contribute to a deeper understanding of Blockchain’s impact on social assistance in Turkey. Methodologically, the study will start with an exhaustive review of the relevant literature to establish a robust conceptual and theoretical framework. Subsequently, the collected data will undergo descriptive analysis to provide comprehensive interpretation and reporting. In conclusion, the findings of this study affirm the unique benefits that Blockchain technology offers to the social policy system in Turkey, including increased efficiency, equity, and transparency within social assistance systems. By addressing implementation challenges, Blockchain can be tailored to suit the needs of social assistance programs, providing invaluable support to policymakers and decision-makers in shaping future policies. The integration of Blockchain technologies not only facilitates efficient resource allocation but also enables timely and accurate responses to social protection needs. Ultimately, it enables the simultaneous implementation of a rights-based, needs-driven social policy model, thus paving the way for transformative changes in social welfare systems.

Industrial relations, Social insurance. Social security. Pension

Detail DOI Sumber

DOAJ Open Access 2024

Values and economic performance across European welfare state regimes: Direct and indirect effects through social capital, human capital and managerial skills.

Katarzyna Growiec, Marcin Czupryna, Jakub Growiec

The values that people hold are linked to their economic performance. These links can be either direct or indirect, operating through moderating variables such as social network participation, interpersonal trust, trust in institutions, human capital, managerial skills and hours worked. In this paper these effects are studied using structural equation modelling (SEM) methodology applied to European Social Survey data from 28 European countries in 2018. Schwartz classification of values is used, distinguishing between Self-Enhancement (Power, Achievement), Openness to Change (Self-Direction), Conservation (Tradition, Security, Conformity) and Self-Transcendence (Universalism, Benevolence) values. It is found that Power has the strongest positive direct effect on economic performance, further strengthened by a positive indirect structural effect through hours worked. Self-Direction is indirectly positively linked to economic performance through higher managerial skills and hours worked. Tradition has a strong negative direct effect on economic performance. Security is indirectly negatively linked with economic performance, owing to its negative effects on interpersonal trust, management skills and hours worked. Some of the identified effects are context-dependent and vary across European welfare state regimes. For example, Power is statistically significantly linked to economic performance only in the liberal and conservative regime. Values promoted by respective welfare state regimes are not necessarily associated with higher incomes within those regimes, e.g., Tradition and Security values promoted in the conservative and Mediterranean regime are associated with lower incomes. These findings may lead to a range of policy implications, particularly in relation to the policies on immigration, demographics, the labor market, and work-life balance. Unfortunately, due to the cross-sectional character of the dataset, causal relations among the variables of interest could not be identified.

Medicine, Science

Detail DOI Sumber

DOAJ Open Access 2024

Suggesting global insights to local challenges: expanding financing of rehabilitation services in low and middle-income countries

Abdulgafoor M. Bachani, Jacob A. Bentley, Hunied Kautsar et al.

PurposeFollowing the rapid transition to non-communicable diseases, increases in injury, and subsequent disability, the world—especially low and middle-income countries (LMICs)—remains ill-equipped for increased demand for rehabilitative services and assistive technology. This scoping review explores rehabilitation financing models used throughout the world and identifies “state of the art” rehabilitation financing strategies to identify opportunities and challenges to expand financing of rehabilitation.Material and methodsWe searched peer-reviewed and grey literature for articles containing information on rehabilitation financing in both LMICs and high-income countries.ResultsForty-two articles were included, highlighting various rehabilitation financing mechanism which involves user fees and other innovative payment as bundled or pooled schemes. Few studies explore policy options to increase investment in the supply of services.Conclusionthis paper highlights opportunities to expand rehabilitation services, namely through promotion of private investment, improvement in provider reimbursement mechanism as well as expanding educational grants to bolster labor supply incentive, and the investment in public and private insurance schemes. Mechanisms of reimbursement are frequently based on global budget and salary which are helpful to control cost escalation but represent important barriers to expand supply and quality of services.

Other systems of medicine, Medical technology

Detail DOI Sumber

arXiv Open Access 2023

First-order Policy Optimization for Robust Policy Evaluation

Yan Li, Guanghui Lan

We adopt a policy optimization viewpoint towards policy evaluation for robust Markov decision process with $\mathrm{s}$-rectangular ambiguity sets. The developed method, named first-order policy evaluation (FRPE), provides the first unified framework for robust policy evaluation in both deterministic (offline) and stochastic (online) settings, with either tabular representation or generic function approximation. In particular, we establish linear convergence in the deterministic setting, and $\tilde{\mathcal{O}}(1/ε^2)$ sample complexity in the stochastic setting. FRPE also extends naturally to evaluating the robust state-action value function with $(\mathrm{s}, \mathrm{a})$-rectangular ambiguity sets. We discuss the application of the developed results for stochastic policy optimization of large-scale robust MDPs.

en math.OC, cs.LG

Detail Sumber

arXiv Open Access 2023

From Voices to Validity: Leveraging Large Language Models (LLMs) for Textual Analysis of Policy Stakeholder Interviews

Alex Liu, Min Sun

Obtaining stakeholders' diverse experiences and opinions about current policy in a timely manner is crucial for policymakers to identify strengths and gaps in resource allocation, thereby supporting effective policy design and implementation. However, manually coding even moderately sized interview texts or open-ended survey responses from stakeholders can often be labor-intensive and time-consuming. This study explores the integration of Large Language Models (LLMs)--like GPT-4--with human expertise to enhance text analysis of stakeholder interviews regarding K-12 education policy within one U.S. state. Employing a mixed-methods approach, human experts developed a codebook and coding processes as informed by domain knowledge and unsupervised topic modeling results. They then designed prompts to guide GPT-4 analysis and iteratively evaluate different prompts' performances. This combined human-computer method enabled nuanced thematic and sentiment analysis. Results reveal that while GPT-4 thematic coding aligned with human coding by 77.89% at specific themes, expanding to broader themes increased congruence to 96.02%, surpassing traditional Natural Language Processing (NLP) methods by over 25%. Additionally, GPT-4 is more closely matched to expert sentiment analysis than lexicon-based methods. Findings from quantitative measures and qualitative reviews underscore the complementary roles of human domain expertise and automated analysis as LLMs offer new perspectives and coding consistency. The human-computer interactive approach enhances efficiency, validity, and interpretability of educational policy research.

en cs.HC, cs.AI

Detail DOI Sumber

arXiv Open Access 2023

Clipped-Objective Policy Gradients for Pessimistic Policy Optimization

Jared Markowitz, Edward W. Staley

To facilitate efficient learning, policy gradient approaches to deep reinforcement learning (RL) are typically paired with variance reduction measures and strategies for making large but safe policy changes based on a batch of experiences. Natural policy gradient methods, including Trust Region Policy Optimization (TRPO), seek to produce monotonic improvement through bounded changes in policy outputs. Proximal Policy Optimization (PPO) is a commonly used, first-order algorithm that instead uses loss clipping to take multiple safe optimization steps per batch of data, replacing the bound on the single step of TRPO with regularization on multiple steps. In this work, we find that the performance of PPO, when applied to continuous action spaces, may be consistently improved through a simple change in objective. Instead of the importance sampling objective of PPO, we instead recommend a basic policy gradient, clipped in an equivalent fashion. While both objectives produce biased gradient estimates with respect to the RL objective, they also both display significantly reduced variance compared to the unbiased off-policy policy gradient. Additionally, we show that (1) the clipped-objective policy gradient (COPG) objective is on average "pessimistic" compared to both the PPO objective and (2) this pessimism promotes enhanced exploration. As a result, we empirically observe that COPG produces improved learning compared to PPO in single-task, constrained, and multi-task learning, without adding significant computational cost or complexity. Compared to TRPO, the COPG approach is seen to offer comparable or superior performance, while retaining the simplicity of a first-order method.

en cs.LG

Detail Sumber

DOAJ Open Access 2023

Manifestation’s Minimization of Social Exclusion In Ukraine Based on Overcoming Dysfunctionality of the Socio-Economic Systems Regulation

Maria Karpiak, Nazariy Popadynets, Viktoria Bondarenko et al.

Achievement of balanced development of socio-economic systems of different levels lies in the plane of effective regulation of processes arising as a result of the action of various factors associated with negative phenomena of both an economic and social nature. Ensuring effective regulatory influence on the part of state and regional authorities is a guarantee of sustainable socio-economic development of the state and its regions, reducing interregional disparities, increasing the investment attractiveness of territories, preventing complications on a political, economic, and interethnic basis. This, in turn, is the basis of national state policy, as well as regional policy as its integral component. The modern dynamism of socio-economic development poses new challenges to the state due to the need to counteract the negative effects of the incessant transformations. We consider the strengthening of inequality and processes of social polarization caused by changes in the social structure of society to be the most striking of them. These changes have resulted today in the spread of such a negative social phenomenon as social exclusion. The problem of social exclusion in Ukraine and its regions is primarily related to the dysfunctionality of the processes of regulation of the main spheres of ensuring citizens' livelihoods, in particular, the spheres of employment, wages, social guarantees and the availability of social services. In addition, the significant risk for individuals to fall into the category of socially excluded today is caused by the presence of other destabilizing factors of an economic nature. These factors include the presence of significant imbalances in the labor market, in particular, a high level of unemployment and a reduction in the number of jobs, a significant level of poverty, including among the working population, the risk of not getting a job in accordance with the acquired qualification level and other factors that are especially threatening for the preservation and development of the human potential of the state.

Education, Economics as a science

Detail DOI Sumber

S2 Open Access 2023

RETURNING TO THE SCIENTIFIC ARTICLE BY PhD M.Kh. KHASSENOV «ON THE RESULTS OF THE ANALYSIS OF LEGISLATION OF THE REPUBLIC OF KAZAKHSTAN ON STATE GUARANTEES OF EQUAL RIGHTS AND EQUAL OPPORTUNITIES OF MEN AND WOMEN»

Yelnur Tureshovna Baimoldinova

Gender equality, as a provision on the prohibition of discrimination on the basis of sex, is enshrined in the Constitution and states that “no one shall be subject to any discrimination for reasons of origin, social, property status, occupation, sex, race, nationality, language, attitude towards religion, convictions, place of residence or any other circumstances”. The article analyzes the scientific article PhD M.Kh. Khasenov “On the results of the analysis of legislation of the Republic of Kazakhstan on state guarantees of equal rights and equal opportunities of men and women” published in the Bulletin of the Institute of Legislation of the Republic of Kazakhstan in 2019. The author analyzed the Law of the Republic of Kazakhstan dated December 8, 2009 No. 223-IV “On state guarantees of equal rights and equal opportunities of men and women”. The research work done by the author and the proposals and recommendations developed haven’t lost their relevance even taking into account the past time. The study raises issues of inconsistency between the analyzed law and other legislative acts, and also notes the presence of unresolved aspects in ensuring gender equality. The issues of equality of men and women in the economic and labor environment, issues of remuneration, their access to state and military service, judicial protection against restrictions and infringement of rights, harassment in the labor environment, etc. are raised. In his scientific work M.Kh. Khasenov provides suggestions and recommendations for solving these issues and further improving the legislation and state gender policy of the Republic of Kazakhstan.

en

Detail DOI Sumber

arXiv Open Access 2022

CUP: Critic-Guided Policy Reuse

Jin Zhang, Siyuan Li, Chongjie Zhang

The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies. CUP utilizes the critic, a common component in actor-critic methods, to evaluate and choose source policies. At each state, CUP chooses the source policy that has the largest one-step improvement over the current target policy, and forms a guidance policy. The guidance policy is theoretically guaranteed to be a monotonic improvement over the current target policy. Then the target policy is regularized to imitate the guidance policy to perform efficient policy search. Empirical results demonstrate that CUP achieves efficient transfer and significantly outperforms baseline algorithms.

en cs.AI

Detail Sumber

arXiv Open Access 2022

Large Language Models can Implement Policy Iteration

Ethan Brooks, Logan Walls, Richard L. Lewis et al.

This work presents In-Context Policy Iteration, an algorithm for performing Reinforcement Learning (RL), in-context, using foundation models. While the application of foundation models to RL has received considerable attention, most approaches rely on either (1) the curation of expert demonstrations (either through manual design or task-specific pretraining) or (2) adaptation to the task of interest using gradient methods (either fine-tuning or training of adapter layers). Both of these techniques have drawbacks. Collecting demonstrations is labor-intensive, and algorithms that rely on them do not outperform the experts from which the demonstrations were derived. All gradient techniques are inherently slow, sacrificing the "few-shot" quality that made in-context learning attractive to begin with. In this work, we present an algorithm, ICPI, that learns to perform RL tasks without expert demonstrations or gradients. Instead we present a policy-iteration method in which the prompt content is the entire locus of learning. ICPI iteratively updates the contents of the prompt from which it derives its policy through trial-and-error interaction with an RL environment. In order to eliminate the role of in-weights learning (on which approaches like Decision Transformer rely heavily), we demonstrate our algorithm using Codex, a language model with no prior knowledge of the domains on which we evaluate it.

en cs.LG

Detail Sumber

arXiv Open Access 2022

Non-Markovian policies occupancy measures

Romain Laroche, Remi Tachet des Combes, Jacob Buckman

A central object of study in Reinforcement Learning (RL) is the Markovian policy, in which an agent's actions are chosen from a memoryless probability distribution, conditioned only on its current state. The family of Markovian policies is broad enough to be interesting, yet simple enough to be amenable to analysis. However, RL often involves more complex policies: ensembles of policies, policies over options, policies updated online, etc. Our main contribution is to prove that the occupancy measure of any non-Markovian policy, i.e., the distribution of transition samples collected with it, can be equivalently generated by a Markovian policy. This result allows theorems about the Markovian policy class to be directly extended to its non-Markovian counterpart, greatly simplifying proofs, in particular those involving replay buffers and datasets. We provide various examples of such applications to the field of Reinforcement Learning.

en cs.LG, eess.SY

Detail Sumber

Hasil untuk "Labor policy. Labor and the state"