Tisane：通过概念和数据关系的正式推理创作统计模型

论文标题

Tisane：通过概念和数据关系的正式推理创作统计模型

Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships

论文作者

Jun, Eunice, Seo, Audrey, Heer, Jeffrey, Just, René

论文摘要

适当的统计模型结合了有关概念如何联系的领域理论和如何测量数据的详细信息。但是，目前，数据分析师缺乏以集成方式记录和推理有关域名假设，数据收集和建模选择的工具支持，从而导致错误可能损害科学有效性。例如，广义线性混合效应模型（GLMM）有助于回答复杂的研究问题，但省略随机效应会损害结果的普遍性。为了满足这一需求，我们提出了Tisane，Tisane是一种混合定量系统，用于创建有或没有混合效应的广义线性模型。 Tisane介绍了一种研究设计规范语言，用于表达和询问有关变量之间关系的问题。 Tisane贡献了一个交互式汇编过程，该过程代表图中的关系，侵入候选统计模型，并提出后续问题以消除用户查询以构建有效模型。在与三名研究人员的案例研究中，我们发现Tisane可以帮助他们专注于自己的目标和假设，同时避免过去的错误。

Proper statistical modeling incorporates domain theory about how concepts relate and details of how data were measured. However, data analysts currently lack tool support for recording and reasoning about domain assumptions, data collection, and modeling choices in an integrated manner, leading to mistakes that can compromise scientific validity. For instance, generalized linear mixed-effects models (GLMMs) help answer complex research questions, but omitting random effects impairs the generalizability of results. To address this need, we present Tisane, a mixed-initiative system for authoring generalized linear models with and without mixed-effects. Tisane introduces a study design specification language for expressing and asking questions about relationships between variables. Tisane contributes an interactive compilation process that represents relationships in a graph, infers candidate statistical models, and asks follow-up questions to disambiguate user queries to construct a valid model. In case studies with three researchers, we find that Tisane helps them focus on their goals and assumptions while avoiding past mistakes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题