带有加速训练的单芯片光子深神经网络

论文标题

带有加速训练的单芯片光子深神经网络

Single chip photonic deep neural network with accelerated training

论文作者

Bandyopadhyay, Saumil, Sludds, Alexander, Krastanov, Stefan, Hamerly, Ryan, Harris, Nicholas, Bunandar, Darius, Streshinsky, Matthew, Hochberg, Michael, Englund, Dirk

论文摘要

随着深度神经网络（DNNS）革新机器学习，能源消耗和吞吐量已成为CMOS电子产品的基本局限性。这促使人们搜索针对人工智能优化的新硬件体系结构，例如电子收缩期阵列，Memristor Crossbar阵列和光学加速器。光学系统可以以极高的速率和效率进行线性矩阵操作，从而激发了低延迟线性代数的最新演示，并且每个多重蓄能操作都低于光子以下的光能消耗。但是，展示在单个芯片中同时协调线性和非线性处理单元的系统仍然是一个核心挑战。在这里，我们在可扩展的光子积分电路（PIC）中引入了这样的系统，该系统由多个关键进展启用：（i）高带宽和低功率可编程的非线性光学功能单元（NOFUS）；（ii）相干矩阵乘法单元（CMXUS）；（iii）光学加速度的原位训练。我们在实验上证明了这种完全集成的相干光学神经网络（FICONN）结构，用于3层DNN，其中包括12个NOFUS和3个在电信C波段中运行的CMXU。在元音分类任务上使用原位训练，Ficonn在测试集上实现了92.7％的准确性，这与数字计算机上获得的精度相同，具有相同数量的权重。这项工作将实验证据提供给了原位训练的理论建议，释放了训练数据吞吐量的数量级改善顺序。此外，ficonn为纳秒延迟和femtojoule的节能开辟了道路。

As deep neural networks (DNNs) revolutionize machine learning, energy consumption and throughput are emerging as fundamental limitations of CMOS electronics. This has motivated a search for new hardware architectures optimized for artificial intelligence, such as electronic systolic arrays, memristor crossbar arrays, and optical accelerators. Optical systems can perform linear matrix operations at exceptionally high rate and efficiency, motivating recent demonstrations of low latency linear algebra and optical energy consumption below a photon per multiply-accumulate operation. However, demonstrating systems that co-integrate both linear and nonlinear processing units in a single chip remains a central challenge. Here we introduce such a system in a scalable photonic integrated circuit (PIC), enabled by several key advances: (i) high-bandwidth and low-power programmable nonlinear optical function units (NOFUs); (ii) coherent matrix multiplication units (CMXUs); and (iii) in situ training with optical acceleration. We experimentally demonstrate this fully-integrated coherent optical neural network (FICONN) architecture for a 3-layer DNN comprising 12 NOFUs and three CMXUs operating in the telecom C-band. Using in situ training on a vowel classification task, the FICONN achieves 92.7% accuracy on a test set, which is identical to the accuracy obtained on a digital computer with the same number of weights. This work lends experimental evidence to theoretical proposals for in situ training, unlocking orders of magnitude improvements in the throughput of training data. Moreover, the FICONN opens the path to inference at nanosecond latency and femtojoule per operation energy efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题