论文标题
基于fMRI的高级视觉皮层的神经编码和解释使用图像标题特征
Neural encoding and interpretation for high-level visual cortices based on fMRI using image caption features
论文作者
论文摘要
根据功能磁共振成像(fMRI),研究人员致力于设计视觉编码模型,以预测人类的神经元活性,以响应呈现的图像刺激,并分析人类视觉皮层的内部机制。由层次处理层组成的深网结构通过通过大数据集学习特定任务的数据特征来形成深网模型。深网模型具有数据的强大和分层表示,并为视觉编码带来了突破,同时揭示了人类视觉皮层中信息处理方式的层次结构相似性。但是,先前的研究几乎使用了那些在分类任务上预先训练的深网模型的图像特征来构建视觉编码模型。除了深层网络结构外,任务或相应的大数据集对于深层网络模型也很重要,但先前的研究却忽略了。由于图像分类是一项相对基本的任务,因此很难指导深层网络模型来掌握数据的高级语义表示,这导致高级视觉皮层的编码性能受到限制。在这项研究中,我们介绍了一个高级视觉任务:图像标题(IC)任务,并提出了基于IC特征(ICFVEM)的视觉编码模型,以编码高级视觉皮层的体素。实验表明,ICFVEM获得的编码性能比先前对分类任务进行的深入网络模型更好。此外,根据语义单词的可视化,还实现了体素的解释来探索体素的详细特征,并且比较分析暗示高级视觉皮层表现出图像含量的相关表示。
On basis of functional magnetic resonance imaging (fMRI), researchers are devoted to designing visual encoding models to predict the neuron activity of human in response to presented image stimuli and analyze inner mechanism of human visual cortices. Deep network structure composed of hierarchical processing layers forms deep network models by learning features of data on specific task through big dataset. Deep network models have powerful and hierarchical representation of data, and have brought about breakthroughs for visual encoding, while revealing hierarchical structural similarity with the manner of information processing in human visual cortices. However, previous studies almost used image features of those deep network models pre-trained on classification task to construct visual encoding models. Except for deep network structure, the task or corresponding big dataset is also important for deep network models, but neglected by previous studies. Because image classification is a relatively fundamental task, it is difficult to guide deep network models to master high-level semantic representations of data, which causes into that encoding performance for high-level visual cortices is limited. In this study, we introduced one higher-level vision task: image caption (IC) task and proposed the visual encoding model based on IC features (ICFVEM) to encode voxels of high-level visual cortices. Experiment demonstrated that ICFVEM obtained better encoding performance than previous deep network models pre-trained on classification task. In addition, the interpretation of voxels was realized to explore the detailed characteristics of voxels based on the visualization of semantic words, and comparative analysis implied that high-level visual cortices behaved the correlative representation of image content.