更多的嵌入，更好的序列标签？

论文标题

更多的嵌入，更好的序列标签？

More Embeddings, Better Sequence Labelers?

论文作者

Wang, Xinyu, Jiang, Yong, Bach, Nguyen, Wang, Tao, Huang, Zhongqiang, Huang, Fei, Tu, Kewei

论文摘要

最近的工作提出了一个情境嵌入家族，可显着提高序列标记者比非上下文嵌入的准确性。但是，对于我们是否可以通过在各种设置中结合不同种类的嵌入来构建更好的序列标签剂，尚无明确的结论。在本文中，我们对18个数据集和8种语言的3个任务进行了广泛的实验，以研究序列标记与各种嵌入串联的准确性，并进行了三个观察：（1）将更多嵌入式变体的串联导致更高的精确度，以更好地精确地精确地精确，并具有较低的跨度设置以及低压设置的某些条件；（2）将其他上下文子词嵌入与上下文字符嵌入会损害极低的资源设置中的准确性；（3）基于（1）的结论，将其他类似的上下文嵌入串联不能导致进一步的改进。我们希望这些结论可以帮助人们在各种环境中建立更强大的序列标签。

Recent work proposes a family of contextual embeddings that significantly improves the accuracy of sequence labelers over non-contextual embeddings. However, there is no definite conclusion on whether we can build better sequence labelers by combining different kinds of embeddings in various settings. In this paper, we conduct extensive experiments on 3 tasks over 18 datasets and 8 languages to study the accuracy of sequence labeling with various embedding concatenations and make three observations: (1) concatenating more embedding variants leads to better accuracy in rich-resource and cross-domain settings and some conditions of low-resource settings; (2) concatenating additional contextual sub-word embeddings with contextual character embeddings hurts the accuracy in extremely low-resource settings; (3) based on the conclusion of (1), concatenating additional similar contextual embeddings cannot lead to further improvements. We hope these conclusions can help people build stronger sequence labelers in various settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题