论文标题
基于BERT和一致性编码的自回归语言隐肌
Autoregressive Linguistic Steganography Based on BERT and Consistency Coding
论文作者
论文摘要
语言隐志(LS)通过将秘密信息嵌入文本中掩盖了沟通的存在。如何生成带有秘密信息的高质量文本是一个关键问题。随着在自然语言处理中深入学习的广泛应用,最近的算法使用语言模型(LM)来生成隐形文本,与许多以前的艺术相比,该文本提供了更高的有效载荷。但是,仍然需要增强安全性。为了解决这个问题,我们提出了一种基于BERT和一致性编码的新型自回归LS算法,这在嵌入有效载荷和系统安全性之间取决于更好的权衡。在拟议的工作中,基于蒙版LM的引入,给定文本,我们使用一致性编码来弥补上一项工作中使用的块编码的缺点,以便我们可以编码任意规模的候选候选代币集,并将概率分布的优势用于信息隐藏。要嵌入的掩盖位置充满了由自回归方式确定的令牌,以增强上下文之间的联系,从而保持文本的质量。实验结果表明,与相关的工作相比,提议的工作可提高地理文本的流畅性,同时保证安全性,并在一定程度上增加嵌入有效载荷。
Linguistic steganography (LS) conceals the presence of communication by embedding secret information into a text. How to generate a high-quality text carrying secret information is a key problem. With the widespread application of deep learning in natural language processing, recent algorithms use a language model (LM) to generate the steganographic text, which provides a higher payload compared with many previous arts. However, the security still needs to be enhanced. To tackle with this problem, we propose a novel autoregressive LS algorithm based on BERT and consistency coding, which achieves a better trade-off between embedding payload and system security. In the proposed work, based on the introduction of the masked LM, given a text, we use consistency coding to make up for the shortcomings of block coding used in the previous work so that we can encode arbitrary-size candidate token set and take advantages of the probability distribution for information hiding. The masked positions to be embedded are filled with tokens determined by an autoregressive manner to enhance the connection between contexts and therefore maintain the quality of the text. Experimental results have shown that, compared with related works, the proposed work improves the fluency of the steganographic text while guaranteeing security, and also increases the embedding payload to a certain extent.