论文标题
将疾病知识注入BERT以进行健康问题回答,医学推断和疾病名称识别
Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition
论文作者
论文摘要
疾病的知识包括疾病各个方面的信息,例如体征和症状,诊断和治疗。这种疾病知识对于许多与健康有关的生物医学任务至关重要,包括消费者健康问题,医学语言推断和疾病名称识别。尽管伯特(Bert)这样的预训练的语言模型在从文本中捕获句法,语义和世界知识方面已经取得了成功,但我们发现它们可以通过特定信息(例如症状,诊断,治疗和其他疾病方面的知识)进一步补充。因此,我们将BERT与疾病知识相结合,以改善这些重要任务。具体而言,我们提出了一种新的疾病知识输液培训程序,并在包括Bert,Bert,Biobert,Scibert,Clinicalbert,Bluebert和Albert在内的BERT模型上进行评估。这三个任务的实验表明,几乎在所有情况下都可以增强这些模型,这表明疾病知识输注的可行性。例如,Biobert对消费者健康问题的准确性从68.29%提高到72.09%,而在两个数据集中观察到了新的SOTA结果。我们可以免费提供数据和代码。
Knowledge of a disease includes information of various aspects of the disease, such as signs and symptoms, diagnosis and treatment. This disease knowledge is critical for many health-related and biomedical tasks, including consumer health question answering, medical language inference and disease name recognition. While pre-trained language models like BERT have shown success in capturing syntactic, semantic, and world knowledge from text, we find they can be further complemented by specific information like knowledge of symptoms, diagnoses, treatments, and other disease aspects. Hence, we integrate BERT with disease knowledge for improving these important tasks. Specifically, we propose a new disease knowledge infusion training procedure and evaluate it on a suite of BERT models including BERT, BioBERT, SciBERT, ClinicalBERT, BlueBERT, and ALBERT. Experiments over the three tasks show that these models can be enhanced in nearly all cases, demonstrating the viability of disease knowledge infusion. For example, accuracy of BioBERT on consumer health question answering is improved from 68.29% to 72.09%, while new SOTA results are observed in two datasets. We make our data and code freely available.