部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

论文作者

Ruis, Laura, Khan, Akbir, Biderman, Stella, Hooker, Sara, Rocktäschel, Tim, Grefenstette, Edward

论文摘要

尽管将LLMS广泛使用为对话剂，但对绩效的评估未能捕获交流的关键方面：在上下文中解释语言 - 融合了其实用主义。人类使用信念和对世界的先验知识来解释语言。例如，我们直观地理解“我戴上手套”的回答“您留下指纹吗？”意思是“否”。为了研究LLM是否能够制作这种推理（称为含义），我们设计了一个简单的任务并评估四类广泛使用的最新模型。我们发现，尽管仅评估需要二进制推断的话语（是或否），但其中三种类别中的模型几乎是随机的。但是，在示例级别上调节的LLMS指令的性能明显更好。这些结果表明，某些微调策略在诱导模型中的实用理解方面要好得多。我们将发现作为进一步研究的起点，以评估LLM在上下文中如何解释语言并推动更务实和有用的人类话语模型的发展。

Despite widespread use of LLMs as conversational agents, evaluations of performance fail to capture a crucial aspect of communication: interpreting language in context -- incorporating its pragmatics. Humans interpret language using beliefs and prior knowledge about the world. For example, we intuitively understand the response "I wore gloves" to the question "Did you leave fingerprints?" as meaning "No". To investigate whether LLMs have the ability to make this type of inference, known as an implicature, we design a simple task and evaluate four categories of widely used state-of-the-art models. We find that, despite only evaluating on utterances that require a binary inference (yes or no), models in three of these categories perform close to random. However, LLMs instruction-tuned at the example-level perform significantly better. These results suggest that certain fine-tuning strategies are far better at inducing pragmatic understanding in models. We present our findings as the starting point for further research into evaluating how LLMs interpret language in context and to drive the development of more pragmatic and useful models of human discourse.

下载PDF全文

下载文献需遵守相关版权规定

论文标题