论文标题

不确定性了解的文本到程序,以回答结构化电子健康记录

Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records

论文作者

Kim, Daeyoung, Bae, Seongsu, Kim, Seungho, Choi, Edward

论文摘要

关于电子健康记录(EHR-QA)的问题回答对医疗领域有重大影响,并且正在积极研究。对结构化EHR-QA的先前研究重点是将自然语言查询转换为诸如SQL或SPARQL(NLQ2Query)之类的查询语言,因此问题范围仅限于通过特定查询语言的预定数据类型。为了将EHR-QA任务扩展到此限制之外,以处理多模式医学数据并在将来解决复杂的推断,需要更原始的全身语言。在本文中,我们为EHR-QA设计了基于程序的模型(NLQ2程序),作为迈向未来方向的第一步。我们通过半监督的方式通过基于程序的方法来处理基于图的EHR-QA数据集MimicsParql*,以克服缺乏黄金程序。没有黄金程序,我们提出的模型显示出与先前最新模型的性能,该模型是NLQ2Query模型(增益0.9%)。此外,对于可靠的EHR-QA模型,我们应用不确定性分解方法来衡量输入问题中的歧义。我们从经验上确认的数据不确定性最能表明输入问题中的歧义。

Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源