论文标题

超越大脑:有条件扩散模型,具有稀疏的掩盖建模用于视觉解码

Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding

论文作者

Chen, Zijiao, Qing, Jiaxin, Xiang, Tiange, Yue, Wan Lin, Zhou, Juan Helen

论文摘要

从大脑记录中解码视觉刺激旨在加深我们对人类视觉系统的理解,并通过大脑计算机界面弥合人类和计算机视觉的坚实基础。但是,由于大脑信号的复杂基础表示和数据注释的稀缺性,重建具有正确语义的高质量图像是一个具有挑战性的问题。在这项工作中,我们介绍了思维视野:稀疏的戴脑建模,具有双条纹的潜扩散模型,用于人类视力解码。首先,我们在大型潜在空间中使用掩码建模来学习有效的fMRI数据的自我监督表示,灵感来自主要视觉皮层中信息的稀疏编码。然后,通过通过双重调节来增强潜在的扩散模型,我们表明,Mind-Vis可以使用几乎没有配对的注释来重建具有语义上匹配的大脑记录的细节的高度合理的图像。我们在定性和定量上基准测试了模型;实验结果表明,我们的方法在语义映射(100道语义分类)和发电质量(FID)中的表现分别优于最先进的方法,分别为66%和41%。还进行了详尽的消融研究以分析我们的框架。

Decoding visual stimuli from brain recordings aims to deepen our understanding of the human visual system and build a solid foundation for bridging human and computer vision through the Brain-Computer Interface. However, reconstructing high-quality images with correct semantics from brain recordings is a challenging problem due to the complex underlying representations of brain signals and the scarcity of data annotations. In this work, we present MinD-Vis: Sparse Masked Brain Modeling with Double-Conditioned Latent Diffusion Model for Human Vision Decoding. Firstly, we learn an effective self-supervised representation of fMRI data using mask modeling in a large latent space inspired by the sparse coding of information in the primary visual cortex. Then by augmenting a latent diffusion model with double-conditioning, we show that MinD-Vis can reconstruct highly plausible images with semantically matching details from brain recordings using very few paired annotations. We benchmarked our model qualitatively and quantitatively; the experimental results indicate that our method outperformed state-of-the-art in both semantic mapping (100-way semantic classification) and generation quality (FID) by 66% and 41% respectively. An exhaustive ablation study was also conducted to analyze our framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源