论文标题
机器学习的表富集系统
Table Enrichment System for Machine Learning
论文作者
论文摘要
数据科学家不断面临如何通过不足的表格数据提高预测准确性的问题。我们提出了一个表富集系统,该系统通过添加数据湖中的外部属性(列)来丰富查询表,并提高机器学习预测模型的准确性。我们的系统具有四个阶段,连接行搜索,与任务相关的表选择,行和列对齐以及功能选择和评估,以有效地为给定查询表创建丰富的表格以及指定的机器学习任务。我们使用Web UI演示我们的系统,以显示表富集的用例。
Data scientists are constantly facing the problem of how to improve prediction accuracy with insufficient tabular data. We propose a table enrichment system that enriches a query table by adding external attributes (columns) from data lakes and improves the accuracy of machine learning predictive models. Our system has four stages, join row search, task-related table selection, row and column alignment, and feature selection and evaluation, to efficiently create an enriched table for a given query table and a specified machine learning task. We demonstrate our system with a web UI to show the use cases of table enrichment.