通过自动系带网络系统学习强大的政策，以捕获广义碎片

论文标题

通过自动系带网络系统学习强大的政策，以捕获广义碎片

Learning Robust Policies for Generalized Debris Capture with an Automated Tether-Net System

论文作者

Zeng, Chen, Hecht, Grant, KrisshnaKumar, Prajit, Shah, Raj K., Chowdhury, Souma, Botta, Eleonora M.

论文摘要

从Chaser航天器发射的系绳网络为捕获和处理轨道上的大空间碎片提供了一种有前途的方法。该系带网络系统受感应和驱动的几种不确定性来源，影响其净发布和关闭控制的性能。然而，基于可靠性的设计控制动作的优化方法仍然具有挑战性，并且相对于追逐者而言，概括了变化的启动场景和目标（碎片）状态。为了搜索一般可靠的控制策略，本文提出了一个加强学习框架，该框架将近端策略优化（PPO2）方法与净动力学模拟集成在一起。后者允许评估基于净的目标捕获的发作，并估计作为PPO2奖励反馈的捕获质量指数。在这里，学习的政策旨在根据任何给定的启动场景，基于移动网和目标的状态对净结算操作的时间进行建模。为了将综合不确定性纳入州估计和发射致动。除了在培训期间进行显着的奖励改进外，训练有素的政策还展示了捕获性能（在广泛的启动/目标方案中），这些效果与基于可靠性的优化在各个方案上运行的情况接近。

Tether-net launched from a chaser spacecraft provides a promising method to capture and dispose of large space debris in orbit. This tether-net system is subject to several sources of uncertainty in sensing and actuation that affect the performance of its net launch and closing control. Earlier reliability-based optimization approaches to design control actions however remain challenging and computationally prohibitive to generalize over varying launch scenarios and target (debris) state relative to the chaser. To search for a general and reliable control policy, this paper presents a reinforcement learning framework that integrates a proximal policy optimization (PPO2) approach with net dynamics simulations. The latter allows evaluating the episodes of net-based target capture, and estimate the capture quality index that serves as the reward feedback to PPO2. Here, the learned policy is designed to model the timing of the net closing action based on the state of the moving net and the target, under any given launch scenario. A stochastic state transition model is considered in order to incorporate synthetic uncertainties in state estimation and launch actuation. Along with notable reward improvement during training, the trained policy demonstrates capture performance (over a wide range of launch/target scenarios) that is close to that obtained with reliability-based optimization run over an individual scenario.

下载PDF全文

下载文献需遵守相关版权规定

论文标题