论文标题

更好的英国最高法院听证会的转录

Better Transcription of UK Supreme Court Hearings

论文作者

Saadany, Hadeel, Breslin, Catherine, Orăsan, Constantin, Walker, Sophie

论文摘要

法律程序的转录对于使司法访问非常重要。但是,语音转录是一个昂贵且缓慢的过程。在本文中,我们描述了一个合并的研究和工业项目的一部分,该项目用于构建专门为英国司法部门设计的自动转录工具。我们解释了我们采用的解决这些挑战所采用的转录法院听证会和自然语言处理(NLP)技术所涉及的挑战。我们将证明,使用构域语言模型以及注入与搭配检测模型提取的常见短语的通用预先训练的预训练的自动语音识别(ASR)系统不仅可以提高转录的听证会的单词错误率(WER),而且还可以避免对法律术语和术语的关键错误和术语中使用的关键错误。

Transcription of legal proceedings is very important to enable access to justice. However, speech transcription is an expensive and slow process. In this paper we describe part of a combined research and industrial project for building an automated transcription tool designed specifically for the Justice sector in the UK. We explain the challenges involved in transcribing court room hearings and the Natural Language Processing (NLP) techniques we employ to tackle these challenges. We will show that fine-tuning a generic off-the-shelf pre-trained Automatic Speech Recognition (ASR) system with an in-domain language model as well as infusing common phrases extracted with a collocation detection model can improve not only the Word Error Rate (WER) of the transcribed hearings but avoid critical errors that are specific of the legal jargon and terminology commonly used in British courts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源