论文标题
通过深度学习预测电视连续剧的IMDB评级:箭头的情况
Predicting IMDb Rating of TV Series with Deep Learning: The Case of Arrow
论文作者
论文摘要
上下文:如今提供的电视连续剧数量很高。由于其大量数量,由于缺乏独创性,许多系列被取消了,这会产生较低的观众。 问题:拥有一个决策支持系统,可以表明为什么某些节目取得了巨大的成功,或者不促进续签或开始演出的选择。 解决方案:我们研究了由CW网络广播的系列箭头的情况,并使用了描述性和预测性建模技术来预测IMDB额定值。我们假设该情节的主题会影响用户的评估,因此数据集仅由该情节的主任组成,该剧集的评论数量是情节的潜在Dirichlet分配(LDA)模型所提取的每个主题的百分比,Wikipedia的观众数量和IMDB的评分。 LDA模型是由单词组成的文档集合的生成概率模型。 方法:在这项规范性研究中,使用了案例研究方法,并使用定量方法对其结果进行了分析。 结果摘要:凭借每个情节的特征,由于KNN模型的平均平方误差相似,但在测试阶段,执行了最好的预测等级的模型是catboost。可以用可接受的均方根误差为0.55预测IMDB评级。
Context: The number of TV series offered nowadays is very high. Due to its large amount, many series are canceled due to a lack of originality that generates a low audience. Problem: Having a decision support system that can show why some shows are a huge success or not would facilitate the choices of renewing or starting a show. Solution: We studied the case of the series Arrow broadcasted by CW Network and used descriptive and predictive modeling techniques to predict the IMDb rating. We assumed that the theme of the episode would affect its evaluation by users, so the dataset is composed only by the director of the episode, the number of reviews that episode got, the percentual of each theme extracted by the Latent Dirichlet Allocation (LDA) model of an episode, the number of viewers from Wikipedia and the rating from IMDb. The LDA model is a generative probabilistic model of a collection of documents made up of words. Method: In this prescriptive research, the case study method was used, and its results were analyzed using a quantitative approach. Summary of Results: With the features of each episode, the model that performed the best to predict the rating was Catboost due to a similar mean squared error of the KNN model but a better standard deviation during the test phase. It was possible to predict IMDb ratings with an acceptable root mean squared error of 0.55.