通过跨段关注文本细分

论文标题

通过跨段关注文本细分

Text Segmentation by Cross Segment Attention

论文作者

Lukasik, Michal, Dadachev, Boris, Simões, Gonçalo, Papineni, Kishore

论文摘要

文档和话语细分是两个基本的NLP任务，这些任务与将文本分解为成分有关，这些任务通常用于帮助下游任务，例如信息检索或文本摘要。在这项工作中，我们提出了三个基于变压器的体系结构，并与先前提出的三个标准数据集的方法进行了全面的比较。在所有情况下，我们建立了一个新的最先进的方法，尤其是将错误率降低了。我们进一步分析了模型尺寸，并发现我们可以在保持良好性能的同时构建较少参数的模型，从而促进现实世界的应用。

Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization. In this work, we propose three transformer-based architectures and provide comprehensive comparisons with previously proposed approaches on three standard datasets. We establish a new state-of-the-art, reducing in particular the error rates by a large margin in all cases. We further analyze model sizes and find that we can build models with many fewer parameters while keeping good performance, thus facilitating real-world applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题