|
|
|
|
double deep q network with adaptive prioritized experience replay
|
|
|
|
|
|
|
|
نویسنده
|
adibian majid ,ebadzadeh mohammad mahdi
|
|
منبع
|
aut journal of modeling and simulation - 2025 - دوره : 57 - شماره : 1 - صفحه:53 -62
|
|
چکیده
|
In deep reinforcement learning, experience replay buffers are used to reduce the effects of sequential data and make better use of past experiences. prioritized experience replay (per) improves upon random sampling by selecting transitions based on their temporal difference (td) error. however, per does not consider how important each transition is or how many times it has been used during training. in this paper, we propose a new method for adaptive prioritization that takes into account three additional transition-level factors: reward, usage count (counter), and policy probability—collectively referred to as rcp values. these values are normalized and used alongside the td error to calculate the probability of selecting each transition from the replay buffer. we evaluate our method on several atari environments and show that using any of the rcp values individually can improve performance compared to standard per. to combine all three rcp components, we explore three aggregation functions: minimum, maximum, and mean. experimental results show that the best aggregation method depends on the environment. however, the mean function generally provides stable improvements across tasks, as it balances all rcp signals and avoids over-relying on any single factor.
|
|
کلیدواژه
|
deep reinforcement learning ,prioritized experience replay ,deep q-network
|
|
آدرس
|
amirkabir university of technology, department of computer engineering, iran, amirkabir university of technology, department of computer engineering, iran
|
|
پست الکترونیکی
|
ebadzadeh@aut.ac.ir
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|