论文标题
物理是新数据
Physics is the New Data
论文作者
论文摘要
机器学习(ML)方法的快速发展从根本上影响了许多应用程序,从计算机视觉,生物学和医学到会计和文本分析。到目前为止,正是大型且经常标记的数据集的可用性使得实现了重大突破。但是,这些方法在经典的物理学科中的采用相对较慢,这种趋势可以追溯到纯粹基于数据的ML的相关方法与物理科学的因果假设驱动的性质之间的相关方法之间的内在差异。此外,经典ML的异常行为需要解决ML解释性和公平性等问题。我们还注意到,深度学习在不同的科学学科中成为主流的序列 - 从医学和生物学开始,然后再延伸到理论化学,只有在此之后,物理学 - 植根于逐渐复杂的描述符,约束和因果结构的逐渐复杂水平,可用于掺入ML架构中。在这里,我们提出,在接下来的十年中,物理学将成为一个新数据,这将继续从点-COM和90年代的科学计算概念过渡到2000 - 2010年的大数据,再到2010 - 2020年的深度学习到具有物理学的科学ML。
The rapid development of machine learning (ML) methods has fundamentally affected numerous applications ranging from computer vision, biology, and medicine to accounting and text analytics. Until now, it was the availability of large and often labeled data sets that enabled significant breakthroughs. However, the adoption of these methods in classical physical disciplines has been relatively slow, a tendency that can be traced to the intrinsic differences between correlative approaches of purely data-based ML and the causal hypothesis-driven nature of physical sciences. Furthermore, anomalous behaviors of classical ML necessitate addressing issues such as explainability and fairness of ML. We also note the sequence in which deep learning became mainstream in different scientific disciplines - starting from medicine and biology and then towards theoretical chemistry, and only after that, physics - is rooted in the progressively more complex level of descriptors, constraints, and causal structures available for incorporation in ML architectures. Here we put forth that over the next decade, physics will become a new data, and this will continue the transition from dot-coms and scientific computing concepts of the 90ies to big data of 2000-2010 to deep learning of 2010-2020 to physics-enabled scientific ML.