论文标题
站点可靠性工程:将项目响应理论应用于应用程序部署实践和控件
Site Reliability Engineering: Application of Item Response Theory to Application Deployment Practices and Controls
论文作者
论文摘要
生产环境中应用程序或解决方案的可靠性是每个SRE团队关注的基本特征之一。同时,实现极端可靠性的同时,成本包括但不限于缓慢的新功能部署,运营成本和机会成本。一个这样的早期努力是为了实现可接受的可靠性和产品速度之间取得良好平衡的目标,就是错误预算及其相关政策。每个组织也有当代部署指南和控件,以确定应用程序部署版本在客户面向或生产环境中的可靠性。这项工作提出了新的目标指标,称为使用二分项目响应理论模型估算的应用程序部署评分。该分数用于评估每个应用程序中每个应用程序面向环境中的应用程序的改进趋势,确定在部署指南和控制方面每个领域中每个应用程序部署的改进范围,调整错误预算,即在应用程序网格中相互依存的应用程序的软错误预算,并通过范围内指定这些新指标,以评估这些新指标的指导,并定义有效性,以评估这些指导的指导,以帮助这些指导范围,从而有助于这些指导的部署指导,该指导范在客户面向环境中应用程序的SLO。这项研究开辟了一个新的研究领域,以开发SRE和DEVOPS空间中新的潜在潜在指数(即新的目标指标)。
Reliability of an application or solution in production environment is one of the fundamental features where every SRE team is critically focused upon. At the same time achieving extreme reliability comes with the cost which include but not limited to slow pace of new feature deployments, operations cost and opportunity cost. One such earlier effort in giving an objective metric to strike the fine balance between acceptable reliability and product velocity is error budget and its associated policy. There are also contemporary deployment guidelines and controls per organization to ascertain the reliability of an application deployment version into customer facing or production environments. This work proposes new objective metrics called Application Deployment Score estimated using dichotomous Item Response Theory model. This score is used to assess the improvement trend of each application version deployed into customer facing environment, identify the improvement scope for each application deployment in each area of deployment guidelines and controls, adjust the error budget i.e. soft error budget of a interdependent application in application mesh by giving soft collective responsibility and finally defines a new metric called deployment index which helps to assess the effectiveness of these contemporary deployment guidelines and controls in upholding the agreed SLOs of the application in customer facing environments. This study opens a new field of research in developing new underlying latent indexes (i.e. new objective metrics) in SRE and DevOps space.