论文标题
将计算和AI推到检测器硅上
Pushing compute and AI onto detector silicon
论文作者
论文摘要
为了充分利用美国能源部的数十亿美元投资到下一代研究基础设施(例如,Exascale,Light Sources,Colleders),不仅在探测器技术中,而且在计算机,尤其是AI中都需要进步。让我们考虑X射线科学的例子。纳米级X射线成像是一种至关重要的工具,可以实现从材料科学和生物学到机械和土木工程的各种科学探索。下一代光源将使X射线梁的亮度和相干通量增加100至1000倍。为了成像较大的样品,必须增加像素阵列检测器的连续帧速率,接近1 MHz,这需要几个TBP(聚合)将像素数据传输到数据采集系统中。使用65 nm的CMOS技术,这种芯片的乐观原始数据速率约为100-200 Gbps。但是,一个连续的1 MHz检测器,仅以$ 256 \ times 256 $ 256 $ 16位分辨率的像素将需要1,000 Gbps(即1 TBPS)带宽芯片的带宽!并行多个高速收发器以提供这样的带宽并代表第一个数据瓶颈是不切实际的。必须通过流芯片将数据压缩或基于AI的特征提取直接在检测器硅芯片内部进行数据压缩或基于AI的特征提取来减少数据大小。
In order to take full advantage of the U.S. Department of Energy's billion-dollar investments into the next-generation research infrastructure (e.g., exascale, light sources, colliders), advances are required not only in detector technology but also in computing and specifically AI. Let us consider an example from X-ray science. Nanoscale X-ray imaging is a crucial tool to enable a wide range of scientific explorations from materials science and biology to mechanical and civil engineering. The next-generation light sources will increase the X-ray beam brightness and coherent flux by 100 to 1,000 times. In order to image larger samples, the continuous frame rate of pixel array detectors must be increased, approaching 1 MHz, which requires several Tbps (aggregated) to transfer pixel data out to a data acquisition system. Using 65-nm CMOS technology, an optimistic raw data rate off such a chip is about 100-200 Gbps. However, a continuous 1 MHz detector with only $256 \times 256$ pixels at 16-bit resolution, for example, will require 1,000 Gbps (i.e., 1 Tbps) bandwidth off the chip! It is impractical to have multiple high-speed transceivers running in parallel to provide such bandwidth and represents the first data bottleneck. New approaches are necessary to reduce the data size by performing data compression or AI-based feature extraction directly inside a detector silicon chip in a streaming manner before sending it off-chip.