论文标题
基于关联的基于内存的经验重播,用于深入增强学习
Associative Memory Based Experience Replay for Deep Reinforcement Learning
论文作者
论文摘要
经验重播是深入增强学习(DRL)的重要组成部分,它存储经验并为代理商实时学习的经验。最近,优先的经验重播(PER)已被证明是强大的,并且在DRL代理商中已广泛部署。但是,由于其频繁和不规则的内存访问,在传统的CPU或GPU体系结构上实施会造成大量的延迟开销。本文提出了一种硬件软件共同设计方法,以设计基于AMPER的相关内存(AM),并具有AM友好的优先采样操作。 Amper在保留学习绩效的同时,以Per为基础的广泛使用的时间 - 基于树 - 传播的优先级采样。此外,我们设计了基于AM的内存计算硬件体系结构,以通过利用并行的内存搜索操作来支持安珀。与GPU上的每次运行相比,Amper在在拟议的硬件上运行时,可以显示出可比的学习性能,同时在提出的硬件上运行55倍至270倍的延迟。
Experience replay is an essential component in deep reinforcement learning (DRL), which stores the experiences and generates experiences for the agent to learn in real time. Recently, prioritized experience replay (PER) has been proven to be powerful and widely deployed in DRL agents. However, implementing PER on traditional CPU or GPU architectures incurs significant latency overhead due to its frequent and irregular memory accesses. This paper proposes a hardware-software co-design approach to design an associative memory (AM) based PER, AMPER, with an AM-friendly priority sampling operation. AMPER replaces the widely-used time-costly tree-traversal-based priority sampling in PER while preserving the learning performance. Further, we design an in-memory computing hardware architecture based on AM to support AMPER by leveraging parallel in-memory search operations. AMPER shows comparable learning performance while achieving 55x to 270x latency improvement when running on the proposed hardware compared to the state-of-the-art PER running on GPU.