论文标题

隐藏的Bawls,Whispers and Yelps:可以使文字听起来不仅仅是单词吗?

Hidden bawls, whispers, and yelps: can text be made to sound more than just its words?

论文作者

Pataca, Caluã de Lacerda, Costa, Paula Dornhofer Paro

论文摘要

无论是嘲笑,小声说话还是大喊大叫,字幕通常都会以相同的方式表示。如果它们是您访问所说的唯一方法,那么声音中表达的主观细微差别将会丢失。由于这些细微差别进行了如此多的沟通,因此我们认为,如果要将字幕用作语音的准确表示,则将副语言品质的视觉表示形式嵌入标题中可以帮助读者使用它们来更好地理解其仅仅是其单纯文本内容的语音。本文介绍了一个用于处理声带韵律(其响度,音高和持续时间)的模型,并将其映射到排版的视觉尺寸(分别是字体量,基线移动和字母间隔)中,创建了这些丢失的人声stleties的视觉表示,这些字体可以直接嵌入文本形式。进行了评估,其中参与者接触了这种语音调节的版式,并要求将其与相似替代方案之间呈现的原始音频相匹配。参与者(n = 117)能够以65%的平均精度正确识别原始音频,在将其调制为动画或静态文本时,没有显着差异。此外,参与者的评论显示,他们的语音调制版式的心理模型差异很大。

Whether a word was bawled, whispered, or yelped, captions will typically represent it in the same way. If they are your only way to access what is being said, subjective nuances expressed in the voice will be lost. Since so much of communication is carried by these nuances, we posit that if captions are to be used as an accurate representation of speech, embedding visual representations of paralinguistic qualities into captions could help readers use them to better understand speech beyond its mere textual content. This paper presents a model for processing vocal prosody (its loudness, pitch, and duration) and mapping it into visual dimensions of typography (respectively, font-weight, baseline shift, and letter-spacing), creating a visual representation of these lost vocal subtleties that can be embedded directly into the typographical form of text. An evaluation was carried out where participants were exposed to this speech-modulated typography and asked to match it to its originating audio, presented between similar alternatives. Participants (n=117) were able to correctly identify the original audios with an average accuracy of 65%, with no significant difference when showing them modulations as animated or static text. Additionally, participants' comments showed their mental models of speech-modulated typography varied widely.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源