论文标题
使用嵌入人员来丰富分类的功能和数据增强
Using Person Embedding to Enrich Features and Data Augmentation for Classification
论文作者
论文摘要
如今,机器学习几乎在任何领域都应用。在有许多方法的机器学习中,分类是最基本和最关键的方法之一。可以通过分类来解决各种问题。模型设置的功能选择非常重要,并且通过功能工程生产新功能在模型的成功中也具有至关重要的位置。在我们的研究中,欺诈检测分类模型建立在标记和不平衡数据集的情况下作为案例研究。尽管这是一种自然语言处理方法,但已经使用单词嵌入创建了客户空间,该方法已在不同领域,尤其是用于推荐系统。作为功能,创建空间中的客户向量被馈送到分类模型。此外,为了增加积极标签的数量,通过使用通过嵌入确定的客户相似性,将具有相似特征的行被重新标记为正。将嵌入方法包含在分类中的模型,该模型与其他模型进行了比较。考虑到结果,可以观察到客户嵌入方法对分类模型的成功有积极影响。
Today, machine learning is applied in almost any field. In machine learning, where there are numerous methods, classification is one of the most basic and crucial ones. Various problems can be solved by classification. The feature selection for model setup is extremely important, and producing new features via feature engineering also has a vital place in the success of the model. In our study, fraud detection classification models are built on a labeled and imbalanced dataset as a case-study. Although it is a natural language processing method, a customer space has been created with word embedding, which has been used in different areas, especially for recommender systems. The customer vectors in the created space are fed to the classification model as a feature. Moreover, to increase the number of positive labels, rows with similar characteristics are re-labeled as positive by using customer similarity determined by embedding. The model in which embedding methods are included in the classification, which provides a better representation of customers, has been compared with other models. Considering the results, it is observed that the customer embedding method had a positive effect on the success of the classification models.