电影评论的历史信誉及其在弱监督分类中的应用

论文标题

电影评论的历史信誉及其在弱监督分类中的应用

Historical Credibility for Movie Reviews and Its Application to Weakly Supervised Classification

论文作者

Kim, Min-Seon, Lim, Bo-Young, Shin, Han-Sub, Kwon, Hyuk-Yoon

论文摘要

在这项研究中，我们处理了判断电影评论的信誉的问题。这个问题是具有挑战性的，因为即使是专家也无法清楚有效地判断电影评论的可信度，而且电影评论的数量也很大。为了解决这个问题，我们提出了历史可信度，该信誉根据每个审阅者编写的历史评级和文本评论来评判评论的可信度。为此，我们提出了三种标准，可以将评论清楚地将其分类为受信任或不信任的标准。我们通过广泛的分析来验证拟议的历史信誉的有效性。具体而言，我们表明，从三个观点角度来看，受信任或不信任的评论之间的特征是可以区分的：1）分布，2）统计和3）相关性。然后，我们将历史信誉应用于弱监督的模型，将给定的审查分类为受信任或不信任的审查。首先，我们表明这是显着有效的，因为根据预定义的标准对整个数据集进行了注释。的确，它只能在0.093秒内注释6,400张电影评论，当我们使用LSTM和SVM作为学习模型时，仅占总学习时间的0.55％〜1.88％。其次，我们表明，基于历史可信度的分类模型显然优于基于文本审查的分类模型。具体而言，前者的分类准确性优于后者的精度高达11.7％〜13.4％。此外，我们清楚地证实，随着数据大小的增加，我们的分类模型显示出更高的准确性。

In this study, we deal with the problem of judging the credibility of movie reviews. The problem is challenging because even experts cannot clearly and efficiently judge the credibility of a movie review and the number of movie reviews is very large. To tackle this problem, we propose historical credibility that judges the credibility of reviews based on the historical ratings and textual reviews written by each reviewer. For this, we present three kinds of criteria that can clearly classify the reviews into trusted or distrusted ones. We validate the effectiveness of the proposed historical credibility through extensive analysis. Specifically, we show that characteristics between the trusted or distrusted reviews are quite distinguishable in terms of three viewpoints: 1) distribution, 2) statistics, and 3) correlation. Then, we apply historical credibility to a weakly supervised model to classify a given review as a trusted or distrusted one. First, we show that it is significantly efficient because the entire data set is annotated according to the predefined criteria. Indeed, it can annotate 6,400 movie reviews only in 0.093 seconds, which occupy only 0.55%~1.88% of the total learning time when we use LSTM and SVM as the learning model. Second, we show that the historical credibility-based classification model clearly outperforms the textual review-based classification model. Specifically, the classification accuracy of the former outperforms that of the latter by up to 11.7%~13.4%. In addition, we clearly confirm that our classification model shows higher accuracy as the data size increases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题