论文标题
Crossa11y:通过跨模式接地确定视频可访问性问题
CrossA11y: Identifying Video Accessibility Issues via Cross-modal Grounding
论文作者
论文摘要
作者通过添加音频描述(AD)使视频访问,并通过添加封闭的字幕(CC)来访问。但是,由于难以识别视频中的可访问性问题,创建AD和CC是具有挑战性和乏味的,尤其是对于非专业描述者和字幕者。视频作者将不得不通过视频观看视频,并以视觉和听觉方式手动检查无法访问的信息框架。在本文中,我们介绍了Crossa11y,该系统可帮助作者有效地检测并解决视频中的视觉和听觉可访问性问题。使用跨模式接地分析,CrossA11Y通过检查模态不对称性来自动测量视频中视觉和音频段的可访问性。然后,Crossa11y在统一的界面中显示这些片段和表面的视觉和音频可访问性问题,使其直观地定位,查看,脚本AD/CC就位,并立即预览所描述的和字幕的视频。与现有基线相比,我们通过与11名参与者的实验室研究一起证明了Crossa11y的有效性。
Authors make their videos visually accessible by adding audio descriptions (AD), and auditorily accessible by adding closed captions (CC). However, creating AD and CC is challenging and tedious, especially for non-professional describers and captioners, due to the difficulty of identifying accessibility problems in videos. A video author will have to watch the video through and manually check for inaccessible information frame-by-frame, for both visual and auditory modalities. In this paper, we present CrossA11y, a system that helps authors efficiently detect and address visual and auditory accessibility issues in videos. Using cross-modal grounding analysis, CrossA11y automatically measures accessibility of visual and audio segments in a video by checking for modality asymmetries. CrossA11y then displays these segments and surfaces visual and audio accessibility issues in a unified interface, making it intuitive to locate, review, script AD/CC in-place, and preview the described and captioned video immediately. We demonstrate the effectiveness of CrossA11y through a lab study with 11 participants, comparing to existing baseline.