零：射频机器学习数据数量预测

论文标题

零：射频机器学习数据数量预测

Training from Zero: Radio Frequency Machine Learning Data Quantity Forecasting

论文作者

Clark IV, William H., Michaels, Alan J.

论文摘要

在任何给定的应用程序空间中培训期间使用的数据直接与系统的性能有关。尽管在机器学习中生产高性能模型还有许多其他因素，但毫无疑问，用于训练系统的数据提供了构建的基础。机器学习空间中使用的基础启发式方法的基本规则之一是，更多的数据导致了更好的模型，但是对于“需要多少数据？”的问题没有简单的答案。这项工作检查了射频频率域空间中的调制分类问题，试图回答以下问题以达到所需的绩效水平所需的训练数据，但是该过程很容易适用于跨模态的分类问题。最终目标是确定一种方法，该方法需要最少的数据收集，以更好地告知更全面的收集工作以实现所需的性能指标。虽然这种方法需要一个初始数据集，该数据集是对问题空间的隐密数据集，以充当提取指标的数据集，但目标是允许初始数据是要比交付实现所需性能的系统所需的数量级的订单。这里提出的技术的另一个好处是，可以将不同数据集的质量与数据域中的数据数量一起进行数字评估和绑定，并最终是问题域中体系结构的性能。

The data used during training in any given application space is directly tied to the performance of the system once deployed. While there are many other factors that go into producing high performance models within machine learning, there is no doubt that the data used to train a system provides the foundation from which to build. One of the underlying rule of thumb heuristics used within the machine learning space is that more data leads to better models, but there is no easy answer for the question, "How much data is needed?" This work examines a modulation classification problem in the Radio Frequency domain space, attempting to answer the question of how much training data is required to achieve a desired level of performance, but the procedure readily applies to classification problems across modalities. The ultimate goal is determining an approach that requires the least amount of data collection to better inform a more thorough collection effort to achieve the desired performance metric. While this approach will require an initial dataset that is germane to the problem space to act as a \textit{target} dataset on which metrics are extracted, the goal is to allow for the initial data to be orders of magnitude smaller than what is required for delivering a system that achieves the desired performance. An additional benefit of the techniques presented here is that the quality of different datasets can be numerically evaluated and tied together with the quantity of data, and ultimately, the performance of the architecture in the problem domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题