论文标题

森林:正则表达式的交互式多树合成器

FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions

论文作者

Ferreira, Margarida, Terra-Neves, Miguel, Ventura, Miguel, Lynce, Inês, Martins, Ruben

论文摘要

基于正则表达式的形式验证器通常在数字表单上使用,以防止用户以错误的格式插入数据。但是,编写这些验证者可能会对某些用户构成挑战。我们提出了森林,这是用于数字形式验证的正则表达合成器。森林产生一个正则表达式,与捕获组的输入值和一组条件相匹配,以确保输入中整数值的有效性。我们的综合过程基于枚举搜索,并使用满足性模型理论(SMT)求解器来探索和修剪搜索空间。我们提出了一种新颖的表示,用于正则表达式合成,多树,该形式在示例中诱导模式,并使用它们通过分裂和混合方法将问题分开。我们还提出了一个新的SMT编码,以合成给定的正则表达式的捕获条件。为了提高对合成正则表达式的信心,我们基于区分输入来实现用户交互。我们使用正则表达式在现实世界形式验证实例上评估了森林。实验结果表明,森林成功返回了72%的实例中所需的正则表达,并且优于最先进的正则表达合成器。

Form validators based on regular expressions are often used on digital forms to prevent users from inserting data in the wrong format. However, writing these validators can pose a challenge to some users. We present FOREST, a regular expression synthesizer for digital form validations. FOREST produces a regular expression that matches the desired pattern for the input values and a set of conditions over capturing groups that ensure the validity of integer values in the input. Our synthesis procedure is based on enumerative search and uses a Satisfiability Modulo Theories (SMT) solver to explore and prune the search space. We propose a novel representation for regular expressions synthesis, multi-tree, which induces patterns in the examples and uses them to split the problem through a divide-and-conquer approach. We also present a new SMT encoding to synthesize capture conditions for a given regular expression. To increase confidence in the synthesized regular expression, we implement user interaction based on distinguishing inputs. We evaluated FOREST on real-world form-validation instances using regular expressions. Experimental results show that FOREST successfully returns the desired regular expression in 72% of the instances and outperforms REGEL, a state-of-the-art regular expression synthesizer.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源