论文标题
加密货币泡沫检测:新的股票市场数据集,财务任务和双曲线模型
Cryptocurrency Bubble Detection: A New Stock Market Dataset, Financial Task & Hyperbolic Models
论文作者
论文摘要
信息迅速传播到社交媒体上会影响定量交易和投资。高度波动性资产(例如加密货币和模因股票)投机贸易日益普及在金融领域提出了新的挑战。研究这种“泡沫” - 市场突然异常行为时期对于更好地了解投资者的行为和市场动态至关重要。但是,高波动性加上大量混乱的社交媒体文本,尤其是对于诸如加密蛋白(Cryptocoins)(例如加密货币)的资产,对现有方法构成了挑战。迈向NLP的Cryptocoins,我们介绍并公开释放CryptoBubbles,这是一个新型的多跨度识别识别任务,用于泡泡检测,以及一个超过200万个推文的9年中的9个交流中的400多个加密货币。此外,我们基于加密货币的幂律动态和社交媒体上的用户行为,开发了一组序列到序列的双曲线模型。我们进一步测试了模型在零摄影设置下的有效性,该测试集与29个“模因股票”有关,这看到由于社交媒体炒作而导致贸易量的增加。通过跨越加密蛋白和模因库存的reddit和Twitter上的定量,定性和零射击分析,我们显示了加密透胶和双曲线模型的实际适用性。
The rapid spread of information over social media influences quantitative trading and investments. The growing popularity of speculative trading of highly volatile assets such as cryptocurrencies and meme stocks presents a fresh challenge in the financial realm. Investigating such "bubbles" - periods of sudden anomalous behavior of markets are critical in better understanding investor behavior and market dynamics. However, high volatility coupled with massive volumes of chaotic social media texts, especially for underexplored assets like cryptocoins pose a challenge to existing methods. Taking the first step towards NLP for cryptocoins, we present and publicly release CryptoBubbles, a novel multi-span identification task for bubble detection, and a dataset of more than 400 cryptocoins from 9 exchanges over five years spanning over two million tweets. Further, we develop a set of sequence-to-sequence hyperbolic models suited to this multi-span identification task based on the power-law dynamics of cryptocurrencies and user behavior on social media. We further test the effectiveness of our models under zero-shot settings on a test set of Reddit posts pertaining to 29 "meme stocks", which see an increase in trade volume due to social media hype. Through quantitative, qualitative, and zero-shot analyses on Reddit and Twitter spanning cryptocoins and meme-stocks, we show the practical applicability of CryptoBubbles and hyperbolic models.