自适应目标跟踪的通用学习波形选择策略

论文标题

自适应目标跟踪的通用学习波形选择策略

Universal Learning Waveform Selection Strategies for Adaptive Target Tracking

论文作者

Thornton, Charles E., Buehrer, R. Michael, Dhillon, Harpreet S., Martone, Anthony F.

论文摘要

长期以来，在线选择使用主动传感器进行目标跟踪的最佳波形一直是一个兴趣的问题。许多常规解决方案都使用估计理论解释，其中使用测量误差上的波形特异性cramér-rao下限用于为每个跟踪步骤选择最佳波形。但是，这种方法仅在高SNR制度中有效，并且需要有关目标运动和测量模型的相当限制的假设集。此外，由于计算问题，即使雷达场景表现出强烈的时间相关性，许多传统方法也仅限于近期或近视优化。最近，已经提出了用于波形选择的强化学习，其中该问题被构成马尔可夫决策过程（MDP），允许长期计划。但是，强化学习的主要局限性是，基础马尔可夫过程的记忆长度通常是现实的目标和信道动态的未知，并且需要更一般的框架。这项工作开发了一种通用的顺序波形选择方案，该方案在任何雷达场景中均非贝尔曼最佳性，可以将其建模为$ u^{\ text {th}} $ order markov进程，用于有限但未知的整数$ u $。我们的方法基于通用源编码领域的公认工具，其中固定源被解析为可变长度短语，以构建上下文树，该源被用作场景行为的概率模型。我们表明，基于上下文树（CTW）方法的多标签版本的算法可用于最佳解决广泛的波形 - 敏捷跟踪问题，同时对环境行为的最小化假设。

Online selection of optimal waveforms for target tracking with active sensors has long been a problem of interest. Many conventional solutions utilize an estimation-theoretic interpretation, in which a waveform-specific Cramér-Rao lower bound on measurement error is used to select the optimal waveform for each tracking step. However, this approach is only valid in the high SNR regime, and requires a rather restrictive set of assumptions regarding the target motion and measurement models. Further, due to computational concerns, many traditional approaches are limited to near-term, or myopic, optimization, even though radar scenes exhibit strong temporal correlation. More recently, reinforcement learning has been proposed for waveform selection, in which the problem is framed as a Markov decision process (MDP), allowing for long-term planning. However, a major limitation of reinforcement learning is that the memory length of the underlying Markov process is often unknown for realistic target and channel dynamics, and a more general framework is desirable. This work develops a universal sequential waveform selection scheme which asymptotically achieves Bellman optimality in any radar scene which can be modeled as a $U^{\text{th}}$ order Markov process for a finite, but unknown, integer $U$. Our approach is based on well-established tools from the field of universal source coding, where a stationary source is parsed into variable length phrases in order to build a context-tree, which is used as a probabalistic model for the scene's behavior. We show that an algorithm based on a multi-alphabet version of the Context-Tree Weighting (CTW) method can be used to optimally solve a broad class of waveform-agile tracking problems while making minimal assumptions about the environment's behavior.

下载PDF全文

下载文献需遵守相关版权规定

论文标题