论文标题

来自1979年至2019年的两代人的IMDB数据;第一部分,数据集简介和初步分析

IMDb data from Two Generations, from 1979 to 2019; Part one, Dataset Introduction and Preliminary Analysis

论文作者

Bahraminasr, M., Sadr, A. Vafaei

论文摘要

“ IMDB”作为用户调节,而访问量最高的门户网站为创建庞大的数据库提供了机会。分析Internet电影数据库中的信息-IMDB,即与电影相关的信息,或者由用户提供的信息将有助于揭示每部电影成功途径中的确定性因素。由于缺乏全面的数据集,我们决定使用统计方法和机器学习模型创建一个汇编数据集,以进行以后的分析;它包含IMDB上提供的各种信息,例如评级数据,类型,演员和工作人员,MPAA评级证书,父母指南详细信息,相关电影信息,海报等,以超过79K标题,这是该日期最大的数据集。本文是针对上述目标的一系列论文中的第一篇论文,对创建数据集的描述和初步分析,包括数据的某些趋势,IMDB分数的人口统计分析以及其类型MPAA评级证书的关系。

"IMDb" as a user-regulating and one the most-visited portal has provided an opportunity to create an enormous database. Analysis of the information on Internet Movie Database - IMDb, either those related to the movie or provided by users would help to reveal the determinative factors in the route of success for each movie. As the lack of a comprehensive dataset was felt, we determined to do create a compendious dataset for the later analysis using the statistical methods and machine learning models; It comprises of various information provided on IMDb such as rating data, genre, cast and crew, MPAA rating certificate, parental guide details, related movie information, posters, etc, for over 79k titles which is the largest dataset by this date. The present paper is the first paper in a series of papers aiming at the mentioned goals, by a description of the created dataset and a preliminary analysis including some trend in data, demographic analysis of IMDb scores and their relation of genre MPAA rating certificate has been investigated.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源