论文标题
部分可观测时空混沌系统的无模型预测
Compressing Video Calls using Synthetic Talking Heads
论文作者
论文摘要
我们利用现代的进步在谈论脑海中提出了一个端到端的系统,用于交谈视频压缩。我们的算法间歇性地传输枢轴帧,而其余的会说话的视频是通过对它们进行动画生成的。我们使用最先进的面部重演网络来检测非居框框架中的关键点并将其传输到接收器中。然后计算一个密集的流量以扭曲枢轴框架以重建非居型框架。传输关键点而不是完整帧会导致重大压缩。我们提出了一种新型算法,以定期适应最佳的枢轴帧,以提供平滑的体验。我们还建议在接收器的末端提出一个框架互助,以进一步提高压缩水平。最后,面部增强网络提高了重建质量,从而大大改善了几代人的清晰度。我们在基准数据集上定性和定量评估我们的方法,并将其与多种压缩技术进行比较。我们在https://cvit.iiit.ac.in/research/project/projects/cvit-projects/talking-video-compression上发布了演示视频和其他信息。
We leverage the modern advancements in talking head generation to propose an end-to-end system for talking head video compression. Our algorithm transmits pivot frames intermittently while the rest of the talking head video is generated by animating them. We use a state-of-the-art face reenactment network to detect key points in the non-pivot frames and transmit them to the receiver. A dense flow is then calculated to warp a pivot frame to reconstruct the non-pivot ones. Transmitting key points instead of full frames leads to significant compression. We propose a novel algorithm to adaptively select the best-suited pivot frames at regular intervals to provide a smooth experience. We also propose a frame-interpolater at the receiver's end to improve the compression levels further. Finally, a face enhancement network improves reconstruction quality, significantly improving several aspects like the sharpness of the generations. We evaluate our method both qualitatively and quantitatively on benchmark datasets and compare it with multiple compression techniques. We release a demo video and additional information at https://cvit.iiit.ac.in/research/projects/cvit-projects/talking-video-compression.