基于实体的索赔代表改善了推文中医疗内容的事实检查

论文标题

基于实体的索赔代表改善了推文中医疗内容的事实检查

Entity-based Claim Representation Improves Fact-Checking of Medical Content in Tweets

论文作者

Wührl, Amelie, Klinger, Roman

论文摘要

社交媒体上的虚假医疗信息会对人们的健康造成伤害。尽管近年来已经认识到对生物医学事实检查的需求，但用户生成的医疗内容受到了相当的关注。同时，其他文本类型的模型可能无法重复使用，因为他们接受过的主张大不相同。例如，Scifact数据集中的主张是简短而专注的：“与抗抑郁药相关的副作用会增加中风的风险”。相比之下，社交媒体持有自然存在的主张，通常嵌入其他背景下：``如果您服用像SSRI这样的抗抑郁药，您可能有可能在2010年遇到一种称为5-羟色胺综合征的疾病，几乎在2010年杀死了我。这展示了现实世界中医学主张与现有事实检查系统所期望的输入之间的不匹配。为了使用户生成的内容可通过现有模型可检查，我们建议以这样的方式对社交媒体的输入进行重新重新制定，以使所产生的索赔模仿已建立的数据集中的索赔特征。为此，我们的方法在关系实体信息的帮助下凝结了索赔，并将索赔汇编为实体关联 - 实体三重，或提取包含这些元素的最短短语。我们表明，重新计算的输入提高了各种事实检查模型的性能，而不是整体检查推文文本。

False medical information on social media poses harm to people's health. While the need for biomedical fact-checking has been recognized in recent years, user-generated medical content has received comparably little attention. At the same time, models for other text genres might not be reusable, because the claims they have been trained with are substantially different. For instance, claims in the SciFact dataset are short and focused: "Side effects associated with antidepressants increases risk of stroke". In contrast, social media holds naturally-occurring claims, often embedded in additional context: "`If you take antidepressants like SSRIs, you could be at risk of a condition called serotonin syndrome' Serotonin syndrome nearly killed me in 2010. Had symptoms of stroke and seizure." This showcases the mismatch between real-world medical claims and the input that existing fact-checking systems expect. To make user-generated content checkable by existing models, we propose to reformulate the social-media input in such a way that the resulting claim mimics the claim characteristics in established datasets. To accomplish this, our method condenses the claim with the help of relational entity information and either compiles the claim out of an entity-relation-entity triple or extracts the shortest phrase that contains these elements. We show that the reformulated input improves the performance of various fact-checking models as opposed to checking the tweet text in its entirety.

下载PDF全文

下载文献需遵守相关版权规定

论文标题