论文标题
关系网格世界:一种新型的关系推理环境和关系信息提取的代理模型
Relational-Grid-World: A Novel Relational Reasoning Environment and An Agent Model for Relational Information Extraction
论文作者
论文摘要
加强学习(RL)代理通常是专门为特定问题设计的,通常具有无法解释的工作过程。可以使用符号人工智能(AI)工具(例如逻辑编程)来改善基于统计方法的代理算法。在这项研究中,我们提出了一个无模型的RL体系结构,该体系结构得到了环境对象的明确关系表示。我们首次将perinet网络体系结构用于动态决策问题而不是基于图像的任务,而多头点产生注意力网络(MHDPA)作为性能比较的基准。我们在两个环境中测试了两个网络---即基线框世界环境和我们的新颖环境,关系网格世界(RGW)。通过程序生成的RGW环境,在视觉感知和组合选择方面很复杂,可以轻松测量RL代理的关系表示性能。使用环境的不同构型进行了实验,以便将呈现的模块和环境与基准进行比较。我们通过pe骨体系结构和MHDPA达到了类似的政策优化绩效结果。此外,我们实现了明确提取命题表示形式 - 这使代理商的统计策略逻辑更容易解释和拖延。代理商策略中的这种灵活性为设计非任务特异性代理体系结构提供了便利。这项研究的主要贡献是可以显式执行关系推理的RL药物的两倍,以及衡量RL代理的关系推理能力的新环境。
Reinforcement learning (RL) agents are often designed specifically for a particular problem and they generally have uninterpretable working processes. Statistical methods-based agent algorithms can be improved in terms of generalizability and interpretability using symbolic Artificial Intelligence (AI) tools such as logic programming. In this study, we present a model-free RL architecture that is supported with explicit relational representations of the environmental objects. For the first time, we use the PrediNet network architecture in a dynamic decision-making problem rather than image-based tasks, and Multi-Head Dot-Product Attention Network (MHDPA) as a baseline for performance comparisons. We tested two networks in two environments ---i.e., the baseline Box-World environment and our novel environment, Relational-Grid-World (RGW). With the procedurally generated RGW environment, which is complex in terms of visual perceptions and combinatorial selections, it is easy to measure the relational representation performance of the RL agents. The experiments were carried out using different configurations of the environment so that the presented module and the environment were compared with the baselines. We reached similar policy optimization performance results with the PrediNet architecture and MHDPA; additionally, we achieved to extract the propositional representation explicitly ---which makes the agent's statistical policy logic more interpretable and tractable. This flexibility in the agent's policy provides convenience for designing non-task-specific agent architectures. The main contributions of this study are two-fold ---an RL agent that can explicitly perform relational reasoning, and a new environment that measures the relational reasoning capabilities of RL agents.