论文标题
AIBENCH方案:方案限制AI基准测试
AIBench Scenario: Scenario-distilling AI Benchmarking
论文作者
论文摘要
诸如Internet服务之类的现代现实应用程序场景包括各种AI和非AI模块,这些模块具有巨大的代码大小以及漫长而复杂的执行路径,这提出了严肃的基准或评估挑战。仅使用AI组件或微观基准,就可以得出容易出错的结论。本文提出了一种攻击上述挑战的方法。我们将现实世界应用程序方案形式化为基于无环形图的模型,并提出规则将其提炼成必需的AI和非AI任务的排列,我们称之为方案基准。与十七个行业合作伙伴一起,我们提取了九种典型场景基准。我们设计和实施可扩展,可配置和灵活的基准测试框架。我们根据框架实现了两个Internet Service AI方案基准,作为两个现实世界应用程序方案的代理。我们将场景,组件和微基准测试视为用于评估的三个必不可少的部分。我们的评估显示了我们方法的优势,而不是仅使用组件或微AI基准。规格,源代码,测试床和结果可从\ url {https://www.benchcouncil.org/aibench/scenario/}公开获得。
Modern real-world application scenarios like Internet services consist of a diversity of AI and non-AI modules with huge code sizes and long and complicated execution paths, which raises serious benchmarking or evaluating challenges. Using AI components or micro benchmarks alone can lead to error-prone conclusions. This paper presents a methodology to attack the above challenge. We formalize a real-world application scenario as a Directed Acyclic Graph-based model and propose the rules to distill it into a permutation of essential AI and non-AI tasks, which we call a scenario benchmark. Together with seventeen industry partners, we extract nine typical scenario benchmarks. We design and implement an extensible, configurable, and flexible benchmark framework. We implement two Internet service AI scenario benchmarks based on the framework as proxies to two real-world application scenarios. We consider scenario, component, and micro benchmarks as three indispensable parts for evaluating. Our evaluation shows the advantage of our methodology against using component or micro AI benchmarks alone. The specifications, source code, testbed, and results are publicly available from \url{https://www.benchcouncil.org/aibench/scenario/}.