论文标题

热带支持向量机及其在系统基因组学上的应用

Tropical Support Vector Machine and its Applications to Phylogenomics

论文作者

Tang, Xiaoxian, Wang, Houjie, Yoshida, Ruriko

论文摘要

全基因组系统发育分析(系统基因组学)中的大多数数据本质上都是多维的,对人类的理解和计算分析构成了重大挑战。同样,由于系统发育树的空间不是欧几里得,我们无法将数据科学中的统计学习模型直接应用于一组系统发育树。实际上,就最大代数而言,系统发育树的空间是热带硕士。因此,为了对系统发育分析进行多层次数据集进行分类,我们提出了热带支持向量机(SVM)。像经典的SVM一样,热带SVM是由热带超平面定义的歧视分类器,它最大化了从数据点到自身的最小热带距离,以便将这些数据点分离为热带投射圆环的扇区(半空间)。硬边缘的热带SVM和软边缘热带SVM都可以作为线性编程问题配制。我们专注于对两类数据进行分类,并通过假设来自同一类别的数据点理想地留在热带分离超平面的同一扇区中,从而研究了一个更简单的情况。对于硬边缘热带SVM,我们证明了将两类数据点分开的必要条件,并且我们为可行的线性编程问题的最佳值展示了一个明确的公式。对于软边缘热带SVM,我们开发了新的方法来计算最佳的热带分离超平面。计算实验表明我们的方法效果很好。我们以空旷的问题结束本文。

Most data in genome-wide phylogenetic analysis (phylogenomics) is essentially multidimensional, posing a major challenge to human comprehension and computational analysis. Also, we can not directly apply statistical learning models in data science to a set of phylogenetic trees since the space of phylogenetic trees is not Euclidean. In fact, the space of phylogenetic trees is a tropical Grassmannian in terms of max-plus algebra. Therefore, to classify multi-locus data sets for phylogenetic analysis, we propose tropical support vector machines (SVMs). Like classical SVMs, a tropical SVM is a discriminative classifier defined by the tropical hyperplane which maximizes the minimum tropical distance from data points to itself in order to separate these data points into sectors (half-spaces) in the tropical projective torus. Both hard margin tropical SVMs and soft margin tropical SVMs can be formulated as linear programming problems. We focus on classifying two categories of data, and we study a simpler case by assuming the data points from the same category ideally stay in the same sector of a tropical separating hyperplane. For hard margin tropical SVMs, we prove the necessary and sufficient conditions for two categories of data points to be separated, and we show an explicit formula for the optimal value of the feasible linear programming problem. For soft margin tropical SVMs, we develop novel methods to compute an optimal tropical separating hyperplane. Computational experiments show our methods work well. We end this paper with open problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源