使用系统级数据的旅行者路线选择偏好的统计推断

论文标题

使用系统级数据的旅行者路线选择偏好的统计推断

Statistical inference of travelers' route choice preferences with system-level data

论文作者

Guarda, Pablo, Qian, Sean

论文摘要

传统网络模型将基于简化且通用效用功能的所有原点用途对之间的旅行行为封装。通常，公用事业函数仅由旅行时间组成，其系数等同于从陈述的偏好数据中获得的估计值。尽管这种建模策略是合理的，但在网络流量聚集中可能会进一步扩大个体数据中的固有采样偏差，从而导致流量不准确。这些数据必须从调查或旅行日记中收集，这可能是劳动密集型，昂贵且仅限于较小时期的。为了解决这些局限性，本研究扩展了经典的双层配方，以使用系统级数据具有多个属性来估算旅行者的实用程序功能。我们制定了一种基于非线性最小二乘正方形的方法，从统计上推断旅行者在网络上下文中使用流量，交通计数，交通速度，交通事故和社会人口统计学信息等属性等属性。优化问题的数学特性及其伪跨性的分析激发了归一化梯度下降的使用。我们还开发了一个假设测试框架，以检查实用程序功能系数的统计特性并执行属性选择。关于合成数据的实验表明，系数始终恢复，假设检验是可靠的统计量，以确定哪些属性是旅行者路线选择的决定因素。此外，一系列蒙特卡罗实验表明，统计推断对原点用途矩阵和交通计数中的噪声以及各种级别的传感器覆盖范围是强大的。该方法还使用在COVID-19爆发之前和期间收集的弗雷斯诺（Fresno）的现实世界多源数据进行了大规模部署。

Traditional network models encapsulate travel behavior among all origin-destination pairs based on a simplified and generic utility function. Typically, the utility function consists of travel time solely and its coefficients are equated to estimates obtained from stated preference data. While this modeling strategy is reasonable, the inherent sampling bias in individual-level data may be further amplified over network flow aggregation, leading to inaccurate flow estimates. This data must be collected from surveys or travel diaries, which may be labor intensive, costly and limited to a small time period. To address these limitations, this study extends classical bi-level formulations to estimate travelers' utility functions with multiple attributes using system-level data. We formulate a methodology grounded on non-linear least squares to statistically infer travelers' utility function in the network context using traffic counts, traffic speeds, traffic incidents and sociodemographic information, among other attributes. The analysis of the mathematical properties of the optimization problem and of its pseudo-convexity motivate the use of normalized gradient descent. We also develop a hypothesis test framework to examine statistical properties of the utility function coefficients and to perform attributes selection. Experiments on synthetic data show that the coefficients are consistently recovered and that hypothesis tests are a reliable statistic to identify which attributes are determinants of travelers' route choices. Besides, a series of Monte-Carlo experiments suggest that statistical inference is robust to noise in the Origin-Destination matrix and in the traffic counts, and to various levels of sensor coverage. The methodology is also deployed at a large scale using real-world multi-source data in Fresno, CA collected before and during the COVID-19 outbreak.

下载PDF全文

下载文献需遵守相关版权规定

论文标题