论文标题
不确定性了解的文本到程序,以回答结构化电子健康记录
Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records
论文作者
论文摘要
关于电子健康记录(EHR-QA)的问题回答对医疗领域有重大影响,并且正在积极研究。对结构化EHR-QA的先前研究重点是将自然语言查询转换为诸如SQL或SPARQL(NLQ2Query)之类的查询语言,因此问题范围仅限于通过特定查询语言的预定数据类型。为了将EHR-QA任务扩展到此限制之外,以处理多模式医学数据并在将来解决复杂的推断,需要更原始的全身语言。在本文中,我们为EHR-QA设计了基于程序的模型(NLQ2程序),作为迈向未来方向的第一步。我们通过半监督的方式通过基于程序的方法来处理基于图的EHR-QA数据集MimicsParql*,以克服缺乏黄金程序。没有黄金程序,我们提出的模型显示出与先前最新模型的性能,该模型是NLQ2Query模型(增益0.9%)。此外,对于可靠的EHR-QA模型,我们应用不确定性分解方法来衡量输入问题中的歧义。我们从经验上确认的数据不确定性最能表明输入问题中的歧义。
Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.