论文标题
每字节不到一个指令验证UTF-8
Validating UTF-8 In Less Than One Instruction Per Byte
论文作者
论文摘要
大多数文本存储在UTF-8中,必须在摄入时进行验证。我们提出了查找算法,该算法的表现优于许多库和语言中使用的UTF-8验证例程超过10次,使用常用的SIMD指令。为了确保可重复性,我们的工作可作为开源软件免费提供。
The majority of text is stored in UTF-8, which must be validated on ingestion. We present the lookup algorithm, which outperforms UTF-8 validation routines used in many libraries and languages by more than 10 times using commonly available SIMD instructions. To ensure reproducibility, our work is freely available as open source software.