SMA-NBO：目标跟踪中具有名义信念状态优化的连续多代理计划

论文标题

SMA-NBO：目标跟踪中具有名义信念状态优化的连续多代理计划

SMA-NBO: A Sequential Multi-Agent Planning with Nominal Belief-State Optimization in Target Tracking

论文作者

Li, Tianqi, Krakow, Lucas W., Gopalswamy, Swaminathan

论文摘要

在使用移动多传感器系统的目标跟踪中，传感器部署会影响观察功能和由此产生的状态估计质量。基于由可观察到的传感器动力学，不可观察的目标状态和随附的观察法组成的部分可观察到的马尔可夫决策过程（POMDP）公式，我们提出了针对多代理目标跟踪问题的分布式信息驱动的解决方案方法，即，即顺序多启示的多元素多元素的名称信念优化（SMA-NBO）。 SMA-NBO试图通过退缩的地平线控制，包括启发式预期成本到GO（HECTG），以最大程度地减少预期的跟踪误差。 SMA-NBO结合了目标信仰状态在地平线上的计算有效近似。逐个代理的决策能够利用板载（边缘）计算来选择（次优）目标跟踪操作，以表现出非侧向合作车队行为。优化问题明确合并了定义世界模型目标遮挡的语义信息。为了说明我们方法的功效，模拟了随机遮挡森林环境。将SMA-NBO与其他基线方法进行比较。模拟结果显示SMA-NBO 1）通过基于最大的后验估计，用单个样本轨迹替换预期目标轨迹的计算来保持跟踪性能并降低计算成本； 2）通过有效使用其他代理人的意图政策来依次优化单一代理政策，从而产生合作车队的决定； 3）恰当地结合了多重加权痕量惩罚（MWTP）HECTG，从而通过计算有效的启发式方法来改善跟踪性能。

In target tracking with mobile multi-sensor systems, sensor deployment impacts the observation capabilities and the resulting state estimation quality. Based on a partially observable Markov decision process (POMDP) formulation comprised of the observable sensor dynamics, unobservable target states, and accompanying observation laws, we present a distributed information-driven solution approach to the multi-agent target tracking problem, namely, sequential multi-agent nominal belief-state optimization (SMA-NBO). SMA-NBO seeks to minimize the expected tracking error via receding horizon control including a heuristic expected cost-to-go (HECTG). SMA-NBO incorporates a computationally efficient approximation of the target belief-state over the horizon. The agent-by-agent decision-making is capable of leveraging on-board (edge) compute for selecting (sub-optimal) target-tracking maneuvers exhibiting non-myopic cooperative fleet behavior. The optimization problem explicitly incorporates semantic information defining target occlusions from a world model. To illustrate the efficacy of our approach, a random occlusion forest environment is simulated. SMA-NBO is compared to other baseline approaches. The simulation results show SMA-NBO 1) maintains tracking performance and reduces the computational cost by replacing the calculation of the expected target trajectory with a single sample trajectory based on maximum a posteriori estimation; 2) generates cooperative fleet decision by sequentially optimizing single-agent policy with efficient usage of other agents' policy of intent; 3) aptly incorporates the multiple weighted trace penalty (MWTP) HECTG, which improves tracking performance with a computationally efficient heuristic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题