论文标题
来自满意和时间目标的综合
Synthesis from Satisficing and Temporal Goals
论文作者
论文摘要
来自高级规格的反应性合成,这些规格结合了在线性时间逻辑LTL中表达的硬约束与折扣和奖励表示的软约束的硬性约束,它在计划和增强学习中都有应用。现有方法将LTL合成的技术与对DS奖励的优化相结合,但未产生声音算法。将LTL合成与满足DS奖励(达到阈值的奖励)结合使用的另一种方法是合理的,并且对于整数折现因子来说是完整的,但实际上,需要分数折扣因子。这项工作扩展了现有的满足方法,介绍了LTL和DS奖励的第一种声音算法,并具有分数折现因子。我们的算法的实用性在机器人计划域中得到了证明。
Reactive synthesis from high-level specifications that combine hard constraints expressed in Linear Temporal Logic LTL with soft constraints expressed by discounted-sum (DS) rewards has applications in planning and reinforcement learning. An existing approach combines techniques from LTL synthesis with optimization for the DS rewards but has failed to yield a sound algorithm. An alternative approach combining LTL synthesis with satisficing DS rewards (rewards that achieve a threshold) is sound and complete for integer discount factors, but, in practice, a fractional discount factor is desired. This work extends the existing satisficing approach, presenting the first sound algorithm for synthesis from LTL and DS rewards with fractional discount factors. The utility of our algorithm is demonstrated on robotic planning domains.