视频问题回答的开放式多模式关系推理

论文标题

视频问题回答的开放式多模式关系推理

Open-Ended Multi-Modal Relational Reasoning for Video Question Answering

论文作者

Luo, Haozheng, Qin, Ruiyang, Xu, Chenwei, Ye, Guo, Luo, Zening

论文摘要

在本文中，我们介绍了一种专门设计用于分析外部环境并解决参与者问题的机器人。该代理商的主要重点是在基于视频的场景中使用基于语言的互动来协助个人。我们提出的方法将视频识别技术和自然语言处理模型整合在机器人代理中。我们通过研究参与者和机器人代理之间出现的相关问题来研究影响人类机器人相互作用的关键因素。从方法上讲，我们的实验发现揭示了信任与相互作用效率之间的正相关关系。此外，与其他基准方法相比，我们的模型显示了2 \％至3 \％的性能增强。

In this paper, we introduce a robotic agent specifically designed to analyze external environments and address participants' questions. The primary focus of this agent is to assist individuals using language-based interactions within video-based scenes. Our proposed method integrates video recognition technology and natural language processing models within the robotic agent. We investigate the crucial factors affecting human-robot interactions by examining pertinent issues arising between participants and robot agents. Methodologically, our experimental findings reveal a positive relationship between trust and interaction efficiency. Furthermore, our model demonstrates a 2\% to 3\% performance enhancement in comparison to other benchmark methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题