[RL] DAC: The Double Actor-Critic Architecture for Learning Options
Summary, ISLab, BUPT, Beijing
The option framework is reformulated as two parallel augmented MDPs. under this new formulation, all policy optimization algorithms are readily available for learning intra-option policy, termination policy, and master option. we apply AC algorithms on each augmented MDP and The DAC architecture is designed. Combined with the PPO algorithm, an empirical study is conducted on challenging robot simulation tasks.