从非线性观察中学习线性二次调节器

论文标题

从非线性观察中学习线性二次调节器

Learning the Linear Quadratic Regulator from Nonlinear Observations

论文作者

Mhammedi, Zakaria, Foster, Dylan J., Simchowitz, Max, Misra, Dipendra, Sun, Wen, Krishnamurthy, Akshay, Rakhlin, Alexander, Langford, John

论文摘要

我们引入了一个新的问题设置，用于连续控制，称为LQR，具有丰富的观察结果或Richlqr。在我们的环境中，环境通过线性动力学和二次成本的低维连续状态总结，但是该代理商以高维的非线性观察（例如来自相机的图像）运行。为了启用样本效率学习，我们假设学习者可以访问一类解码器功能（例如，神经网络），这些功能足够灵活，可以捕获从观测到潜在状态的映射。我们介绍了一种新的算法Richid，该算法在RichlQR中学习了一个近乎最佳的策略，其样本复杂性缩放仅具有潜在状态空间的尺寸和解码器功能类的能力。 Richid是甲骨文的效率，仅通过对最小二乘回归甲骨文的调用才能访问解码器类。我们的结果构成了连续控制的第一个可证明的样本复杂性保证，并且在系统模型和一般函数近似中具有未知的非线性。

We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learner has access to a class of decoder functions (e.g., neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracle-efficient and accesses the decoder class only through calls to a least-squares regression oracle. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题