描述针对视力受损的人的认知和视觉细节的图像：生成包容性段落的方法

论文标题

描述针对视力受损的人的认知和视觉细节的图像：生成包容性段落的方法

Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs

论文作者

Fernandes, Daniel Louzada, Ribeiro, Marcos Henrique Fonseca, Cerqueira, Fabio Ribeiro, Silva, Michel Melo

论文摘要

由于辅助技术和人工智能领域的成就，最近出现了针对视觉残障人士的几项服务。尽管辅助系统的可用性有所增长，但缺乏支持特定任务的服务，例如了解在线内容中显示的图像上下文，例如网络研讨会。图像字幕技术及其变体被限制为辅助技术，因为它们在生成特定描述时与视力受损的人的需求不符。我们提出了一种方法，用于生成网络研讨会图像的上下文，将密集的字幕技术与一组过滤器结合在一起，以适合我们域中的字幕，以及用于抽象摘要任务的语言模型。结果表明，我们可以通过结合图像分析方法和神经语言模型来制作具有更高可解释性的描述，并专注于该人群的相关信息。

Several services for people with visual disabilities have emerged recently due to achievements in Assistive Technologies and Artificial Intelligence areas. Despite the growth in assistive systems availability, there is a lack of services that support specific tasks, such as understanding the image context presented in online content, e.g., webinars. Image captioning techniques and their variants are limited as Assistive Technologies as they do not match the needs of visually impaired people when generating specific descriptions. We propose an approach for generating context of webinar images combining a dense captioning technique with a set of filters, to fit the captions in our domain, and a language model for the abstractive summary task. The results demonstrated that we can produce descriptions with higher interpretability and focused on the relevant information for that group of people by combining image analysis methods and neural language models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题