论文标题

合成:具有功能控制的个人保险索赔模拟器

SynthETIC: an individual insurance claim simulator with feature control

论文作者

Avanzi, Benjamin, Taylor, Gregory Clive, Wang, Melantha, Wong, Bernard

论文摘要

近年来,机器学习对保险损失保留的应用迅速增加。当应用于大型数据集时,它们产生最大的价值,例如个人索赔或大索赔三角形。简而言之,它们可能在分析其体积足以掩盖其功能的裸眼镜的任何数据集的分析中很有用。不幸的是,精算文献中如此大的数据集缺乏供应。因此,需要转向合成数据。尽管这些方法的最终目标是应用于真实数据,但还应鼓励使用包含实际数据中常见特征的合成数据。 尽管存在许多索赔模拟器,但每个索赔都在其自身上下文中很有价值,但包含许多理想(但复杂)的数据功能需要进一步的开发。因此,在本文中,我们回顾了这些理想的特征,并提出了一个新的索赔经验的模拟器,称为“合成”。 我们的模拟器是公开可用的,开源的,并填补了非生活精算工具包的空白。该模拟器专门允许通常在实践中发生理想(但可选的)数据特征,例如定居点和开发模式的变化;与叠加的通货膨胀和各种不连续性一样,也可以在变量之间产生各种依赖性。用户可以完全控制个人主张演变的力学。结果,生成的数据集的复杂性(意味着分析的难度级别)可以拨打从极其简单到极其复杂的任何位置。

Recent years have seen rapid increase in the application of machine learning to insurance loss reserving. They yield most value when applied to large data sets, such as individual claims, or large claim triangles. In short, they are likely to be useful in the analysis of any data set whose volume is sufficient to obscure a naked-eye view of its features. Unfortunately, such large data sets are in short supply in the actuarial literature. Accordingly, one needs to turn to synthetic data. Although the ultimate objective of these methods is application to real data, the use of synthetic data containing features commonly observed in real data is also to be encouraged. While there are a number of claims simulators in existence, each valuable within its own context, the inclusion of a number of desirable (but complicated) data features requires further development. Accordingly, in this paper we review those desirable features, and propose a new simulator of individual claim experience called `SynthETIC`. Our simulator is publicly available, open source, and fills a gap in the non-life actuarial toolkit. The simulator specifically allows for desirable (but optionally complicated) data features typically occurring in practice, such as variations in rates of settlements and development patterns; as with superimposed inflation, and various discontinuities, and also enables various dependencies between variables. The user has full control of the mechanics of the evolution of an individual claim. As a result, the complexity of the data set generated (meaning the level of difficulty of analysis) may be dialled anywhere from extremely simple to extremely complex.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源