PPL台：概率编程语言的评估框架

论文标题

PPL台：概率编程语言的评估框架

PPL Bench: Evaluation Framework For Probabilistic Programming Languages

论文作者

Kulkarni, Sourabh, Shah, Kinjal Divesh, Arora, Nimar, Wang, Xiaoyan, Li, Yucen Lily, Tehrani, Nazanin Khosravani, Tingley, Michael, Noursi, David, Torabi, Narjes, Masouleh, Sepehr Akhavan, Lippert, Eric, Meijer, Erik

论文摘要

我们介绍了PPL Bench，这是一种用于评估各种统计模型上概率编程语言（PPLS）的新基准。基准包括许多模型的数据生成和评估代码以及某些常见PPL中的实现。所有基准代码和PPL实现都可以在GitHub上获得。我们欢迎新模型和PPLS的贡献以及现有的PPL实现的改进。基准的目的是两倍。首先，我们希望研究人员和会议审稿人能够在标准化环境中评估PPLS的改进。其次，我们希望最终用户能够选择最适合其建模应用程序的PPL。特别是，我们有兴趣评估推断后的收敛的准确性和速度。每个PPL只需要在给定模型和观察数据的情况下提供后样品。该框架自动计算并绘制了预测对数可能性的增长，除了报告其他常见指标，例如有效的样本量和$ \ hat {r} $之外，还可以计算数据。

We introduce PPL Bench, a new benchmark for evaluating Probabilistic Programming Languages (PPLs) on a variety of statistical models. The benchmark includes data generation and evaluation code for a number of models as well as implementations in some common PPLs. All of the benchmark code and PPL implementations are available on Github. We welcome contributions of new models and PPLs and as well as improvements in existing PPL implementations. The purpose of the benchmark is two-fold. First, we want researchers as well as conference reviewers to be able to evaluate improvements in PPLs in a standardized setting. Second, we want end users to be able to pick the PPL that is most suited for their modeling application. In particular, we are interested in evaluating the accuracy and speed of convergence of the inferred posterior. Each PPL only needs to provide posterior samples given a model and observation data. The framework automatically computes and plots growth in predictive log-likelihood on held out data in addition to reporting other common metrics such as effective sample size and $\hat{r}$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题