论文标题

Google趋势时间序列的校准

Calibration of Google Trends Time Series

论文作者

West, Robert

论文摘要

Google趋势是一种工具,可让研究人员在时间和空间之间分析Google搜索查询的普及。在单个请求中,用户可以在共同的刻度上获得最多5个查询的时间序列,该时间序列归一化为0至100,并舍入到整数精度。尽管Google趋势的总体价值,但四舍五入会导致重大问题,以至于完全不信息,全零时间序列可能会在要求以及更受欢迎的查询时退还不受欢迎的查询。我们通过提出Google Trends Anchor Bank(G-TAB)来解决这个问题,这是对Google趋势数据进行校准的有效解决方案。我们的方法表达了一个任意数量的查询数量公共量表的普及,而不会因四舍五入而受到损害。该方法分为两个阶段。在离线预处理阶段,构建了一个“锚库”,这是一组跨越普通范围的查询,所有查询都通过仔细将多个Google趋势请求链接在一起,对常见的参考查询进行了校准。在在线部署阶段,通过在锚库中执行有效的二进制搜索来校准任何给定的搜索查询。每个搜索步骤都需要一个Google趋势请求,但是正如我们在经验评估中所证明的那样,几个步骤就足够了。我们在https://github.com/epfl-dlab/googletrendsanchorbank上公开代码作为易于使用的库公开提供。

Google Trends is a tool that allows researchers to analyze the popularity of Google search queries across time and space. In a single request, users can obtain time series for up to 5 queries on a common scale, normalized to the range from 0 to 100 and rounded to integer precision. Despite the overall value of Google Trends, rounding causes major problems, to the extent that entirely uninformative, all-zero time series may be returned for unpopular queries when requested together with more popular queries. We address this issue by proposing Google Trends Anchor Bank (G-TAB), an efficient solution for the calibration of Google Trends data. Our method expresses the popularity of an arbitrary number of queries on a common scale without being compromised by rounding errors. The method proceeds in two phases. In the offline preprocessing phase, an "anchor bank" is constructed, a set of queries spanning the full spectrum of popularity, all calibrated against a common reference query by carefully chaining together multiple Google Trends requests. In the online deployment phase, any given search query is calibrated by performing an efficient binary search in the anchor bank. Each search step requires one Google Trends request, but few steps suffice, as we demonstrate in an empirical evaluation. We make our code publicly available as an easy-to-use library at https://github.com/epfl-dlab/GoogleTrendsAnchorBank.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源