论文标题
FOWD:用于数据挖掘和机器学习的免费海浪数据集
FOWD: A Free Ocean Wave Dataset for Data Mining and Machine Learning
论文作者
论文摘要
海洋中极端波浪的发生在大多数情况下仍然笼罩在神秘之中,因为这些事件的罕见性质使它们难以通过传统方法进行分析。现代数据挖掘和机器学习方法提供了一种有希望的方法,但它们通常依赖大量清洁数据的可用性。 为了促进这种渴望数据的方法在表面海浪中的应用,我们开发了Fowd,这是一个自由使用的波浪数据集和处理框架。 Fowd描述了原始观测值转换为目录,该目录将特征性的海态参数映射到观察到的波数量。具体而言,我们采用了尊重海洋非平稳性的运行窗口方法,以及广泛的质量控制,以减少所得数据集中的偏见。 我们还提供了Fowd处理工具包的参考Python实现,我们用来处理包含超过40亿波的整个CDIP浮标数据目录。在第一个实验中,我们发现,当有完整的海拔时间序列可用时,表面高度峰度和最大波高是流氓波活性的最强单变量预测指标。当仅给出光谱时,波峰 - 陷入相关性,光谱带宽和平均周期填补了这一角色。
The occurrence of extreme (rogue) waves in the ocean is for the most part still shrouded in mystery, as the rare nature of these events makes them difficult to analyze with traditional methods. Modern data mining and machine learning methods provide a promising way out, but they typically rely on the availability of massive amounts of well-cleaned data. To facilitate the application of such data-hungry methods to surface ocean waves, we developed FOWD, a freely available wave dataset and processing framework. FOWD describes the conversion of raw observations into a catalogue that maps characteristic sea state parameters to observed wave quantities. Specifically, we employ a running window approach that respects the non-stationary nature of the oceans, and extensive quality control to reduce bias in the resulting dataset. We also supply a reference Python implementation of the FOWD processing toolkit, which we use to process the entire CDIP buoy data catalogue containing over 4 billion waves. In a first experiment, we find that, when the full elevation time series is available, surface elevation kurtosis and maximum wave height are the strongest univariate predictors for rogue wave activity. When just a spectrum is given, crest-trough correlation, spectral bandwidth, and mean period fill this role.