论文标题

通过了解您学到的知识来喂养您所需要的东西

Feeding What You Need by Understanding What You Learned

论文作者

Wang, Xiaoqiang, Liu, Bang, Xu, Fangli, Long, Bo, Tang, Siliang, Wu, Lingfei

论文摘要

机器阅读理解(MRC)揭示了理解给定文本段落并根据其回答问题的能力。 MRC中现有的研究工作严重依赖大型模型和语料库来改善通过诸如精确匹配($ em $)和$ f_1 $之类的指标评估的性能。但是,这种范式缺乏足够的解释来建模能力,无法有效地用大型语料库训练模型。在本文中,我们认为,对模型功能和数据属性有深入的了解可以帮助我们根据其学习状态为模型提供适当的培训数据。具体而言,我们设计了MRC功能评估框架,该框架以可解释和多维的方式评估模型功能。基于它,我们进一步发现并解开了各种数据属性与模型性能之间的连接。最后,为了验证拟议的MRC能力评估框架的有效性,我们将其纳入了课程学习管道中,并设计了能力边界突破性课程(CBBC)策略,该策略执行了基于模型的培训以最大程度地提高数据值并提高培训效率。广泛的实验表明,我们的方法可显着提高性能,在MRC任务上提高了$ EM $ / $ F_1 $的11.22% / 8.71%。

Machine Reading Comprehension (MRC) reveals the ability to understand a given text passage and answer questions based on it. Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match ($EM$) and $F_1$. However, such a paradigm lacks sufficient interpretation to model capability and can not efficiently train a model with a large corpus. In this paper, we argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data based on its learning status. Specifically, we design an MRC capability assessment framework that assesses model capabilities in an explainable and multi-dimensional manner. Based on it, we further uncover and disentangle the connections between various data properties and model performance. Finally, to verify the effectiveness of the proposed MRC capability assessment framework, we incorporate it into a curriculum learning pipeline and devise a Capability Boundary Breakthrough Curriculum (CBBC) strategy, which performs a model capability-based training to maximize the data value and improve training efficiency. Extensive experiments demonstrate that our approach significantly improves performance, achieving up to an 11.22% / 8.71% improvement of $EM$ / $F_1$ on MRC tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源