论文标题
s^2转换器用于面膜感应的高光谱图像重建
S^2-Transformer for Mask-Aware Hyperspectral Image Reconstruction
论文作者
论文摘要
快照压缩成像(SCI)作为捕获高光谱图像的新方法。它操作光学编码器将3D数据压缩到2D测量中,并采用软件解码器进行信号重建。最近,具有变压器重建后端备注高保真感测性能的编码光圈快照压缩成像仪(CASSI)的代表性SCI设置。然而,主要的空间和光谱注意力设计显示出高光谱建模的局限性。空间注意值描述了像素间的相关性,但忽略了每个像素内的跨光谱变化。光谱注意力的大小不可能与令牌空间大小相关,因此可以瓶颈分配。此外,Cassi将空间和光谱信息纠缠为2D测量,并置于信息解开和建模的障碍。此外,CASSI用物理二进制掩码挡住了光,从而产生掩盖的数据丢失。为了应对挑战,我们提出了一个空间光谱(S2-)变压器,该变压器是通过平行的注意力设计和面具感知的学习策略实现的。首先,我们系统地探索了不同空间( - 光谱)注意力设计的利弊,基于我们发现,在这些空间( - 光谱)的注意力设计中,我们发现在平行井distangles中表现出两种注意事项,并模拟混合信息。其次,掩盖的像素会引起更高的预测难度,应与未掩盖的像素不同。我们通过将蒙版编码的预测作为不确定性估计器来适应归因于掩模结构的损失损失。从理论上讲,我们讨论了拟议的学习策略的掩盖/未掩盖区域之间的独特收敛趋势。广泛的实验表明,平均而言,所提出的方法的结果优于最新方法。
Snapshot compressive imaging (SCI) surges as a novel way of capturing hyperspectral images. It operates an optical encoder to compress the 3D data into a 2D measurement and adopts a software decoder for the signal reconstruction. Recently, a representative SCI set-up of coded aperture snapshot compressive imager (CASSI) with Transformer reconstruction backend remarks high-fidelity sensing performance. However, dominant spatial and spectral attention designs show limitations in hyperspectral modeling. The spatial attention values describe the inter-pixel correlation but overlook the across-spectra variation within each pixel. The spectral attention size is unscalable to the token spatial size and thus bottlenecks information allocation. Besides, CASSI entangles the spatial and spectral information into a 2D measurement, placing a barrier for information disentanglement and modeling. In addition, CASSI blocks the light with a physical binary mask, yielding the masked data loss. To tackle above challenges, we propose a spatial-spectral (S2-) Transformer implemented by a paralleled attention design and a mask-aware learning strategy. Firstly, we systematically explore pros and cons of different spatial (-spectral) attention designs, based on which we find performing both attentions in parallel well disentangles and models the blended information. Secondly, the masked pixels induce higher prediction difficulty and should be treated differently from unmasked ones. We adaptively prioritize the loss penalty attributing to the mask structure by referring to the mask-encoded prediction as an uncertainty estimator. We theoretically discuss the distinct convergence tendencies between masked/unmasked regions of the proposed learning strategy. Extensive experiments demonstrate that on average, the results of the proposed method are superior over the state-of-the-art method.