ltiatcmu在Semeval-2020任务11：合并多个宣传跨度标识的多层次功能

论文标题

ltiatcmu在Semeval-2020任务11：合并多个宣传跨度标识的多层次功能

LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification

论文作者

Khosla, Sopan, Joshi, Rishabh, Dutt, Ritam, Black, Alan W, Tsvetkov, Yulia

论文摘要

在本文中，我们描述了我们在新闻文章中对宣传跨度标识任务的提交。我们介绍了一个基于Bert-Bilstm的跨度宣传分类模型，该模型识别出代币在句子中跨越的范围是宣传。 “多个粒度”模型结合了各种文本粒度的语言知识，包括单词，句子和文档句法，语义和务实的影响特征，与其语言 - 语言变体相比，这显着改善了模型性能。为了促进更好的代表性学习，我们还收集了10k新闻文章的语料库，并将其用于微调模型。最终模型是多数投票集团，它通过利用不同的知识的不同子集来学习不同的宣传课程边界，并在测试排行榜上获得$ 4^{th} $位置。我们的最终模型和代码在https://github.com/sopu/propagandasemeval2020上发布。

In this paper we describe our submission for the task of Propaganda Span Identification in news articles. We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda. The "multi-granular" model incorporates linguistic knowledge at various levels of text granularity, including word, sentence and document level syntactic, semantic and pragmatic affect features, which significantly improve model performance, compared to its language-agnostic variant. To facilitate better representation learning, we also collect a corpus of 10k news articles, and use it for fine-tuning the model. The final model is a majority-voting ensemble which learns different propaganda class boundaries by leveraging different subsets of incorporated knowledge and attains $4^{th}$ position on the test leaderboard. Our final model and code is released at https://github.com/sopu/PropagandaSemEval2020.

下载PDF全文

下载文献需遵守相关版权规定

论文标题