光学系统优化的参数化强化学习

论文标题

光学系统优化的参数化强化学习

Parameterized Reinforcement Learning for Optical System Optimization

论文作者

Wankerl, Heribert, Stern, Maike L., Mahdavi, Ali, Eichler, Christoph, Lang, Elmar W.

论文摘要

设计具有指定光学特性的多层光学系统是一个反设计问题，其中所得设计由几个离散和连续参数确定。特别是，我们考虑三个设计参数来描述多层堆栈：每层的介电材料和厚度以及层的总数。这种结合，离散和连续参数是一个具有挑战性的优化问题，通常需要对最佳系统设计进行计算昂贵的搜索。因此，大多数方法仅确定系统层的最佳厚度。为了结合层材料和层总数，我们提出了一种方法，该方法将连续层堆叠为马尔可夫决策过程中的参数化操作。我们提出了一个成倍体转换的奖励信号，该信号可以简化策略优化，并适应Q-Learning的最新变体以进行逆设计优化。我们证明，我们的方法优于人类专家和关于所达到的光学特征的天真的增强学习算法。此外，学到的Q值包含有关多层光学系统光学特性的信息，从而允许物理解释或何种分析。

Designing a multi-layer optical system with designated optical characteristics is an inverse design problem in which the resulting design is determined by several discrete and continuous parameters. In particular, we consider three design parameters to describe a multi-layer stack: Each layer's dielectric material and thickness as well as the total number of layers. Such a combination of both, discrete and continuous parameters is a challenging optimization problem that often requires a computationally expensive search for an optimal system design. Hence, most methods merely determine the optimal thicknesses of the system's layers. To incorporate layer material and the total number of layers as well, we propose a method that considers the stacking of consecutive layers as parameterized actions in a Markov decision process. We propose an exponentially transformed reward signal that eases policy optimization and adapt a recent variant of Q-learning for inverse design optimization. We demonstrate that our method outperforms human experts and a naive reinforcement learning algorithm concerning the achieved optical characteristics. Moreover, the learned Q-values contain information about the optical properties of multi-layer optical systems, thereby allowing physical interpretation or what-if analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题