论文标题

图形学习的仪器变量检测:用于房屋定价的高维GIS-CENSUS数据的应用

Instrument variable detection with graph learning : an application to high dimensional GIS-census data for house pricing

论文作者

Xu, Ning, Fisher, Timothy C. G., Hong, Jian

论文摘要

内生性偏见和仪器变量验证一直是统计和计量经济学中的重要主题。在大数据时代,这些问题通常与维度问题相结合,因此需要更多的关注。在本文中,我们合并了两种著名的工具,从机器学习和生物统计学中合并了 - 可变选择算法和概率图 - - 使用2010年悉尼的数据估算房价和相应的因果结构。该估计使用200吉比特超高维数据库,该数据库由当地学校数据,GIS信息,人口普查数据,房屋特征和其他社会经济记录组成。使用“大数据”,我们表明可以有效地执行数据驱动的仪器选择并清除无效的仪器。我们的方法在存在高维,复杂的因果结构以及随之而来的多重共线性的情况下提高了可变选择,稳定性和鲁棒性的稀疏性,并恢复了稀疏而直观的因果结构。该方法还揭示了内生性检测,仪器验证,较弱的仪器修剪和选择有效仪器的效率和有效性。从机器学习的角度来看,估计结果既与悉尼房屋市场,古典经济理论以及同时方程建模的先前发现的事实保持一致并确认。此外,估计结果与经典计量经济学工具(例如两阶段最小二乘回归和不同的仪器测试)一致并支持。所有代码都可以在\ url {https://github.com/isaac2math/solar_graph_learning}找到。

Endogeneity bias and instrument variable validation have always been important topics in statistics and econometrics. In the era of big data, such issues typically combine with dimensionality issues and, hence, require even more attention. In this paper, we merge two well-known tools from machine learning and biostatistics---variable selection algorithms and probablistic graphs---to estimate house prices and the corresponding causal structure using 2010 data on Sydney. The estimation uses a 200-gigabyte ultrahigh dimensional database consisting of local school data, GIS information, census data, house characteristics and other socio-economic records. Using "big data", we show that it is possible to perform a data-driven instrument selection efficiently and purge out the invalid instruments. Our approach improves the sparsity of variable selection, stability and robustness in the presence of high dimensionality, complicated causal structures and the consequent multicollinearity, and recovers a sparse and intuitive causal structure. The approach also reveals an efficiency and effectiveness in endogeneity detection, instrument validation, weak instrument pruning and the selection of valid instruments. From the perspective of machine learning, the estimation results both align with and confirms the facts of Sydney house market, the classical economic theories and the previous findings of simultaneous equations modeling. Moreover, the estimation results are consistent with and supported by classical econometric tools such as two-stage least square regression and different instrument tests. All the code may be found at \url{https://github.com/isaac2math/solar_graph_learning}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源