论文标题
法律 - 伯特:木偶直接脱离法学院
LEGAL-BERT: The Muppets straight out of Law School
论文作者
论文摘要
伯特在几个NLP任务中取得了令人印象深刻的表现。但是,关于其在专业领域的适应指南的研究有限。在这里,我们专注于法律领域,在该领域中,我们探讨了多种将BERT模型应用于下游法律任务的方法,并在多个数据集上进行了评估。我们的发现表明,以前的预训练和微调指南,通常盲目地遵循,并不总是在法律领域中很好地概括。因此,我们提出对在专用域中应用BERT时对可用策略的系统研究。这些是:(a)使用开箱即用的原始伯特,(b)通过对域特异性语料库进行其他预训练以及(c)(c)在特定于域特异性语料库上从头开始进行预训练的BERT。我们还建议在下游任务进行微调时提出更广泛的高参数搜索空间,并发布法律 - 伯特(Legal-Bert),这是一个旨在协助法律NLP研究,计算法和法律技术应用程序的BERT模型。
BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.