电影和电视剧集中敏感活动的可扩展时间定位

论文标题

电影和电视剧集中敏感活动的可扩展时间定位

Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes

论文作者

Hao, Xiang, Chen, Jingxiang, Chen, Shixing, Saad, Ahmed, Hamid, Raffay

论文摘要

为了帮助客户做出更有信息的观看选择，视频流式服务试图调整其内容，并提供更多可见性，以了解其哪些电影和电视节目包含适合年龄的材料（例如，裸体，性，暴力或药物使用）。监督的模型要本地化这些敏感活动需要大量的剪辑级标记数据，而这些数据很难获得，而弱监督的模型通常不提供竞争性的准确性。为了应对这一挑战，我们提出了一个新颖的Chiol2fine网络，旨在与稀疏适合年龄的活动的稀疏剪辑级标签一起使用易于获得的视频级弱标签。我们的模型汇总了框架级预测，以进行视频级别的分类，因此能够利用稀疏的夹子级标签以及视频级别的标签。此外，通过以层次结构进行框架级别的预测，我们的方法能够克服由于适合年龄的含量的罕见出现性质而引起的标签不平衡问题。我们使用41,234部电影和电视剧集（〜3年的视频访问）和250个国家 /地区的250个国家提出了我们的方法的比较结果，这是迄今为止在有史以来出版过的长期视频中年龄适当的活动定位的最大经验分析。我们的方法可提供107.2％的相对地图改进（从5.5％到11.4％），比现有的最新活动 - 定位方法。

To help customers make better-informed viewing choices, video-streaming services try to moderate their content and provide more visibility into which portions of their movies and TV episodes contain age-appropriate material (e.g., nudity, sex, violence, or drug-use). Supervised models to localize these sensitive activities require large amounts of clip-level labeled data which is hard to obtain, while weakly-supervised models to this end usually do not offer competitive accuracy. To address this challenge, we propose a novel Coarse2Fine network designed to make use of readily obtainable video-level weak labels in conjunction with sparse clip-level labels of age-appropriate activities. Our model aggregates frame-level predictions to make video-level classifications and is therefore able to leverage sparse clip-level labels along with video-level labels. Furthermore, by performing frame-level predictions in a hierarchical manner, our approach is able to overcome the label-imbalance problem caused due to the rare-occurrence nature of age-appropriate content. We present comparative results of our approach using 41,234 movies and TV episodes (~3 years of video-content) from 521 sub-genres and 250 countries making it by far the largest-scale empirical analysis of age-appropriate activity localization in long-form videos ever published. Our approach offers 107.2% relative mAP improvement (from 5.5% to 11.4%) over existing state-of-the-art activity-localization approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题