“ Covid疫苗是针对Covid的，但牛津疫苗是在牛津进行的！”适当名词化合物的语义解释

论文标题

“ Covid疫苗是针对Covid的，但牛津疫苗是在牛津进行的！”适当名词化合物的语义解释

"Covid vaccine is against Covid but Oxford vaccine is made at Oxford!" Semantic Interpretation of Proper Noun Compounds

论文作者

Kolluru, Keshav, Stanovsky, Gabriel, Mausam

论文摘要

适当的名词化合物，例如“ covid疫苗”，以简洁的方式传达信息（“ covid疫苗”是一种“抗抗焦视疾病的疫苗”）。这些通常是在新闻头条等短形式领域中使用的，但在寻求信息的应用程序中很大程度上被忽略了。为了解决此限制，我们发布了一个新的手动注释数据集Pronci，由22.5K适当名词化合物及其自由形式的语义解释组成。 PRONCI是先前名词化合物数据集的60倍，还包括以前尚未探索的非复合示例。我们尝试各种神经模型，以自动从适当的名词化合物中产生语义解释，从少量促使到监督学习，以及有关组成名词的知识程度不同。我们发现，增加有针对性的知识，尤其是关于普通名词的知识，导致性能增长高达2.8％。最后，我们将产生的解释与现有的开放IE系统集成在一起，并以85％的精度观察到收率增加了7.5％。该数据集和代码可在https://github.com/dair-iitd/pronci上找到。

Proper noun compounds, e.g., "Covid vaccine", convey information in a succinct manner (a "Covid vaccine" is a "vaccine that immunizes against the Covid disease"). These are commonly used in short-form domains, such as news headlines, but are largely ignored in information-seeking applications. To address this limitation, we release a new manually annotated dataset, ProNCI, consisting of 22.5K proper noun compounds along with their free-form semantic interpretations. ProNCI is 60 times larger than prior noun compound datasets and also includes non-compositional examples, which have not been previously explored. We experiment with various neural models for automatically generating the semantic interpretations from proper noun compounds, ranging from few-shot prompting to supervised learning, with varying degrees of knowledge about the constituent nouns. We find that adding targeted knowledge, particularly about the common noun, results in performance gains of upto 2.8%. Finally, we integrate our model generated interpretations with an existing Open IE system and observe an 7.5% increase in yield at a precision of 85%. The dataset and code are available at https://github.com/dair-iitd/pronci.

下载PDF全文

下载文献需遵守相关版权规定

论文标题