论文标题

机器人在网络档案中仍然超过人类,但比以前少

Robots Still Outnumber Humans in Web Archives, But Less Than Before

论文作者

Jayanetti, Himarsha R., Garg, Kritika, Alam, Sawood, Nelson, Michael L., Weigle, Michele C.

论文摘要

为了识别机器人和人类并分析其各自的访问模式,我们使用了2012年和2019年的Internet档案(IA)Wayback机器访问日志,以及Arquivo.pt的(葡萄牙网络档案)访问日志。我们从2019年开始访问日志。我们在访问日志中确定了这些会话中的用户会话,并根据他们的螺旋式行为将这些会话分类为人类或机器人。为了更好地了解用户如何通过Web档案导航,我们评估了这些会话以发现用户访问模式。根据两个档案以及在IA访问日志的两年(2012年对2019年)之间,我们对检测到的机器人与人类的比较及其用户访问模式和时间偏好进行了比较。 2012年IA中检测到的机器人总数大于2019年IA(请求中的21%,会议多18%)。机器人占Arquivo.pt(2019)中请求的98%(占97%)。我们发现,机器人几乎完全限于IA 2012中的“浸入”和“撇脱”访问模式,但是在2019年IA中展示了所有模式及其组合。人类和机器人都表现出对近乎过去存档的网页的偏爱。

To identify robots and humans and analyze their respective access patterns, we used the Internet Archive's (IA) Wayback Machine access logs from 2012 and 2019, as well as Arquivo.pt's (Portuguese Web Archive) access logs from 2019. We identified user sessions in the access logs and classified those sessions as human or robot based on their browsing behavior. To better understand how users navigate through the web archives, we evaluated these sessions to discover user access patterns. Based on the two archives and between the two years of IA access logs (2012 vs. 2019), we present a comparison of detected robots vs. humans and their user access patterns and temporal preferences. The total number of robots detected in IA 2012 is greater than in IA 2019 (21% more in requests and 18% more in sessions). Robots account for 98% of requests (97% of sessions) in Arquivo.pt (2019). We found that the robots are almost entirely limited to "Dip" and "Skim" access patterns in IA 2012, but exhibit all the patterns and their combinations in IA 2019. Both humans and robots show a preference for web pages archived in the near past.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源