自我监督的日志解析

论文标题

自我监督的日志解析

Self-Supervised Log Parsing

论文作者

Nedelkoski, Sasho, Bogatinovski, Jasmin, Acker, Alexander, Cardoso, Jorge, Kao, Odej

论文摘要

日志在软件系统的开发和维护过程中广泛使用。他们收集运行时事件并允许跟踪代码执行，这可以实现各种关键任务，例如故障排除和故障检测。但是，大规模软件系统会产生大量的半结构日志记录，对自动分析构成了重大挑战。用自由形式的文本日志消息分析半结构化记录到结构化模板中是可以进一步分析的第一个至关重要的步骤。现有方法依赖于对数特定的启发式方法或手动规则提取。这些通常专门用于解析某些日志类型，从而限制性能得分和概括。我们提出了一种新颖的解析技术，称为Nulog，它利用了自我监督的学习模型，并将解析任务作为掩盖语言建模（MLM）制定。在解析过程中，模型以向量嵌入的形式从日志中提取汇总。这允许将MLM作为预训练与下游异常检测任务耦合。我们评估了Nulog在10个现实世界日志数据集上的解析性能，并将结果与12种解析技术进行比较。结果表明，Nulog平均以99％的速度以分析精度优于现有方法，并达到了与地面真相模板的最低编辑距离。此外，还进行了两项案例研究，以证明在受监督和无监督的情况下基于对数的异常检测方法的能力。结果表明，Nulog可以成功地用于支持故障排除任务。该实现可在https://github.com/nulog/nulog上获得。

Logs are extensively used during the development and maintenance of software systems. They collect runtime events and allow tracking of code execution, which enables a variety of critical tasks such as troubleshooting and fault detection. However, large-scale software systems generate massive volumes of semi-structured log records, posing a major challenge for automated analysis. Parsing semi-structured records with free-form text log messages into structured templates is the first and crucial step that enables further analysis. Existing approaches rely on log-specific heuristics or manual rule extraction. These are often specialized in parsing certain log types, and thus, limit performance scores and generalization. We propose a novel parsing technique called NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling (MLM). In the process of parsing, the model extracts summarizations from the logs in the form of a vector embedding. This allows the coupling of the MLM as pre-training with a downstream anomaly detection task. We evaluate the parsing performance of NuLog on 10 real-world log datasets and compare the results with 12 parsing techniques. The results show that NuLog outperforms existing methods in parsing accuracy with an average of 99% and achieves the lowest edit distance to the ground truth templates. Additionally, two case studies are conducted to demonstrate the ability of the approach for log-based anomaly detection in both supervised and unsupervised scenario. The results show that NuLog can be successfully used to support troubleshooting tasks. The implementation is available at https://github.com/nulog/nulog.

下载PDF全文

下载文献需遵守相关版权规定

论文标题