论文标题

使用自然语言处理学习共同基金分类

Learning Mutual Fund Categorization using Natural Language Processing

论文作者

Vamvourellis, Dimitrios, Toth, Mate Attila, Desai, Dhruv, Mehta, Dhagash, Pasquali, Stefano

论文摘要

长期以来,共同基金或交易所交易基金(ETF)的分类为财务分析师提供了为从竞争对手分析开始的各种目的的同行分析,以量化投资组合多元化。分类方法通常依赖于从n-1a表格中提取的结构化格式的基金组成数据。在这里,我们启动一项研究,直接从使用自然语言处理(NLP)的表格中描绘的非结构化数据中学习分类系统。将输入数据仅作为表格中报告的投资策略描述,而目标变量是Lipper全球类别,并且使用各种NLP模型,我们表明,分类系统确实可以很高的准确性地学习。我们讨论了我们发现的含义和应用,以及现有的预培训架构的局限性在应用它们以学习基金分类时。

Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP). Positing as a multi-class classification problem with the input data being only the investment strategy description as reported in the form and the target variable being the Lipper Global categories, and using various NLP models, we show that the categorization system can indeed be learned with high accuracy. We discuss implications and applications of our findings as well as limitations of existing pre-trained architectures in applying them to learn fund categorization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源