论文标题
增强了模式进化的反转
Enhanced Inversion of Schema Evolution with Provenance
论文作者
论文摘要
在许多科学领域,长期数据驱动的研究变得必不可少。通常,数据格式,结构和语义随着时间的流逝而变化,数据集的发展。因此,尤其是几十年来的研究必须考虑更改数据库模式。这些数据库的演变导致大量模式必须存储和管理,昂贵且耗时。但是,从研究数据的可重复性意义上讲,每个数据库版本都必须在很少努力的情况下重建。因此,可以随时验证和复制以前发表的结果。 但是,在许多情况下,这种进化不能完全重建。本文对15个最常用的架构修改操作员进行了分类,并为每个操作定义了相关的倒置。为了避免信息丢失,它还定义了必须存储哪些其他出处信息。我们定义了四个类,用于处理悬挂的造型,重复和出处不变的操作员。每个班级将由一个代表提出。 通过使用和扩展架构映射理论及其查询,数据分析,为什么提供的架构和架构进化,我们能够将数据分析应用程序与不断发展的数据库结构下的出处结合起来,以启用更长时间的科学结果的可重复性。尽管用于分析或进化的模式映射的大多数倒置不是准确的,而仅准确,而仅准确,添加出处信息使我们能够重建一个足以保证可重复性的研究数据的子数据库。
Long-term data-driven studies have become indispensable in many areas of science. Often, the data formats, structures and semantics of data change over time, the data sets evolve. Therefore, studies over several decades in particular have to consider changing database schemas. The evolution of these databases lead at some point to a large number of schemas, which have to be stored and managed, costly and time-consuming. However, in the sense of reproducibility of research data each database version must be reconstructable with little effort. So a previously published result can be validated and reproduced at any time. Nevertheless, in many cases, such an evolution can not be fully reconstructed. This article classifies the 15 most frequently used schema modification operators and defines the associated inverses for each operation. For avoiding an information loss, it furthermore defines which additional provenance information have to be stored. We define four classes dealing with dangling tuples, duplicates and provenance-invariant operators. Each class will be presented by one representative. By using and extending the theory of schema mappings and their inverses for queries, data analysis, why-provenance, and schema evolution, we are able to combine data analysis applications with provenance under evolving database structures, in order to enable the reproducibility of scientific results over longer periods of time. While most of the inverses of schema mappings used for analysis or evolution are not exact, but only quasi-inverses, adding provenance information enables us to reconstruct a sub-database of research data that is sufficient to guarantee reproducibility.