actor double critic architecture for dialogue system

Fa | Ar | En

actor double critic architecture for dialogue system


نویسنده	saffari y. ,salimi sartakhti j. s.
منبع	journal of electrical and computer engineering innovations - 2023 - دوره : 11 - شماره : 2 - صفحه:363 -372
چکیده	Background and objectives: most of the recent dialogue policy learning ‎methods are based on reinforcement learning (rl). however, the basic rl ‎algorithms like deep q-network, have drawbacks in environments with ‎large state and action spaces such as dialogue systems. most of the ‎policy-based methods are slow, cause of the estimating of the action value ‎using the computation of the sum of the discounted rewards for each ‎action. in value-based rl methods, function approximation errors lead to ‎overestimation in value estimation and finally suboptimal policies. there ‎are works that try to resolve the mentioned problems using combining rl ‎methods, but most of them were applied in the game environments, or ‎they just focused on combining dqn variants. this paper for the first time ‎presents a new method that combines actor-critic and double dqn named ‎double actor-critic (dac), in the dialogue system, which significantly ‎improves the stability, speed, and performance of dialogue policy learning. ‎methods: in the actor critic to overcome the slow learning of normal dqn, ‎the critic unit approximates the value function and evaluates the quality ‎of the policy used by the actor, which means that the actor can learn the ‎policy faster. moreover, to overcome the overestimation issue of dqn, ‎double dqn is employed. finally, to have a smoother update, a heuristic ‎loss is introduced that chooses the minimum loss of actor-critic and ‎double dqn. ‎results: experiments in a movie ticket booking task show that the ‎proposed method has more stable learning without drop after ‎overestimation and can reach the threshold of learning in fewer episodes ‎of learning. ‎conclusion: unlike previous works that mostly focused on just proposing ‎a combination of dqn variants, this study combines dqn variants with ‎actor-critic to benefit from both policy-based and value-based rl methods ‎and overcome two main issues of both of them, slow learning and ‎overestimation. experimental results show that the proposed method can ‎make a more accurate conversation with a user as a dialogue policy ‎learner.‎
کلیدواژه	dialogue system ,actor-critic ,double dqn ,task-based
آدرس	university of kashan, department of electrical and computer engineering, iran, university of kashan, department of electrical and computer engineering, iran
پست الکترونیکی	salimi@kashanu.ac.ir



Authors