论文标题
通过订单依赖性发现域订单
Discovering Domain Orders through Order Dependencies
论文作者
论文摘要
许多真实的数据都带有明确定义的域订单;例如,字符串的词典秩序,整数数字和时间顺序。我们的目标是发现我们尚未知道的隐性域订单。例如,中国农历日历中的几个月命令是拐角<杏子<桃子。为此,我们通过通过订单依赖性发现数据中的隐式域订单来增强数据分析方法。我们列举了可牵引的特殊情况,并朝着最一般的案例进行,我们证明这是NP完整的。我们表明,一般情况仍然可以由SAT求解器有效地处理。我们还设计了一种有趣的度量,以对发现的隐式域订单进行排名,我们通过用户研究对其进行验证。基于具有现实世界数据的广泛实验套件,我们确定了算法的疗效,以及通过在三个应用程序中证明显着的附加值(数据分析,查询优化和数据挖掘)发现的域订单的实用性。
Much real-world data come with explicitly defined domain orders; e.g., lexicographic order for strings, numeric for integers, and chronological for time. Our goal is to discover implicit domain orders that we do not already know; for instance, that the order of months in the Chinese Lunar calendar is Corner < Apricot < Peach. To do so, we enhance data profiling methods by discovering implicit domain orders in data through order dependencies. We enumerate tractable special cases and proceed towards the most general case, which we prove is NP-complete. We show that the general case nevertheless can be effectively handled by a SAT solver. We also devise an interestingness measure to rank the discovered implicit domain orders, which we validate with a user study. Based on an extensive suite of experiments with real-world data, we establish the efficacy of our algorithms, and the utility of the domain orders discovered by demonstrating significant added value in three applications (data profiling, query optimization, and data mining).