论文标题
RankDVQA:基于排名启发的混合动力培训的深VQA
RankDVQA: Deep VQA based on Ranking-inspired Hybrid Training
论文作者
论文摘要
近年来,深度学习技术已经显示出改善视频质量评估(VQA)的巨大潜力,与传统方法相比,与主观观点的相关性更高。但是,大型培训数据库的可用性有限和无效的培训方法,深度VQA方法的开发受到了限制。结果,深层VQA方法很难实现始终如一的出色性能和模型概括。在这种情况下,本文提出了基于两阶段培训方法的新的VQA方法,该方法促使我们开发了一个大规模的VQA培训数据库,而无需雇用人类受试者提供地面真相标签。该方法用于训练新的基于变压器的网络体系结构,利用了不同扭曲序列的质量排名,而不是最大程度地减少与地面质量标签的差异。与最先进的常规和深VQA方法相比,所得的深VQA方法(对于完整的参考和无参考方案),FR-和NR-RANKDVQA与感知质量的相关性持续更高,而不得超过八个测试集的常规和深VQA方法,平均SROCC值为0.8972(FR)和0.7791(NR),而无需表现八个测试集。建议的质量指标和大型培训数据库的源代码可在https://chenfeng-bristol.github.io/rankdvqa上找到。
In recent years, deep learning techniques have shown significant potential for improving video quality assessment (VQA), achieving higher correlation with subjective opinions compared to conventional approaches. However, the development of deep VQA methods has been constrained by the limited availability of large-scale training databases and ineffective training methodologies. As a result, it is difficult for deep VQA approaches to achieve consistently superior performance and model generalization. In this context, this paper proposes new VQA methods based on a two-stage training methodology which motivates us to develop a large-scale VQA training database without employing human subjects to provide ground truth labels. This method was used to train a new transformer-based network architecture, exploiting quality ranking of different distorted sequences rather than minimizing the difference from the ground-truth quality labels. The resulting deep VQA methods (for both full reference and no reference scenarios), FR- and NR-RankDVQA, exhibit consistently higher correlation with perceptual quality compared to the state-of-the-art conventional and deep VQA methods, with average SROCC values of 0.8972 (FR) and 0.7791 (NR) over eight test sets without performing cross-validation. The source code of the proposed quality metrics and the large training database are available at https://chenfeng-bristol.github.io/RankDVQA.