论文标题
PACSET(包装的序列化树):减少树合奏部署的推理潜伏期
PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree Ensemble Deployment
论文作者
论文摘要
我们提出了在尚未将模型加载到存储器中时优化推理潜伏期的树团的序列化和序列化的方法。每当模型大于内存时就会出现,但也会系统地将模型部署在低资源设备上,例如在物联网中,或作为Web微服务运行,以根据需要分配资源。我们的包装序列化树(PACSET)使用外部内存算法的原理在树集合的布局中编码参考位置。布局交通道相关的多个树中的节点,使用叶子基数在最流行的路径上划分节点,并针对I/O块进行了优化。结果是每个I/O产生的有用数据较高,从而导致交互式工作负载的分类延迟减少了2-6倍。
We present methods to serialize and deserialize tree ensembles that optimize inference latency when models are not already loaded into memory. This arises whenever models are larger than memory, but also systematically when models are deployed on low-resource devices, such as in the Internet of Things, or run as Web micro-services where resources are allocated on demand. Our packed serialized trees (PACSET) encode reference locality in the layout of a tree ensemble using principles from external memory algorithms. The layout interleaves correlated nodes across multiple trees, uses leaf cardinality to collocate the nodes on the most popular paths and is optimized for the I/O blocksize. The result is that each I/O yields a higher fraction of useful data, leading to a 2-6 times reduction in classification latency for interactive workloads.