论文标题
具有图表和深神经网络的动物诊断的基因组序列分类
Genome Sequence Classification for Animal Diagnostics with Graph Representations and Deep Neural Networks
论文作者
论文摘要
牛呼吸道疾病复合物(BRDC)是一种复杂的呼吸道疾病,包括多种病因,包括细菌和病毒。据估计,BRDC导致的死亡率,发病率,治疗和隔离率造成了牛业的巨大损失。 BRDC的早期发现和管理对于减轻经济损失至关重要。当前的动物疾病诊断基于传统测试,例如细菌培养,血清学和聚合酶链反应(PCR)测试。即使这些测试已针对多种疾病进行了验证,但他们的主要挑战是它们同时检测多种病原体存在的能力有限。数据分析和机器学习以及对元素测序的应用程序的进步是在几种应用程序上设定趋势。在这项工作中,我们展示了一种机器学习方法,可以使用基于K-MER的网络嵌入牛元素组序列中存在的病原体特征,然后进行基于深度学习的分类任务。通过在两个不同的模拟数据集上进行的实验,我们表明基于网络的机器学习方法可以检测具有高达89.7%精度的病原体特征。我们将根据要求公开提供数据,以在困难的域中解决这一重要问题。
Bovine Respiratory Disease Complex (BRDC) is a complex respiratory disease in cattle with multiple etiologies, including bacterial and viral. It is estimated that mortality, morbidity, therapy, and quarantine resulting from BRDC account for significant losses in the cattle industry. Early detection and management of BRDC are crucial in mitigating economic losses. Current animal disease diagnostics is based on traditional tests such as bacterial culture, serolog, and Polymerase Chain Reaction (PCR) tests. Even though these tests are validated for several diseases, their main challenge is their limited ability to detect the presence of multiple pathogens simultaneously. Advancements of data analytics and machine learning and applications over metagenome sequencing are setting trends on several applications. In this work, we demonstrate a machine learning approach to identify pathogen signatures present in bovine metagenome sequences using k-mer-based network embedding followed by a deep learning-based classification task. With experiments conducted on two different simulated datasets, we show that networks-based machine learning approaches can detect pathogen signature with up to 89.7% accuracy. We will make the data available publicly upon request to tackle this important problem in a difficult domain.