论文标题
大规模的工业和专业职业数据集
A Large-scale Industrial and Professional Occupation Dataset
论文作者
论文摘要
利用职业数据挖掘和分析的兴趣越来越大。在当今的就业市场中,职业数据挖掘和分析的重要性越来越重要,因为它使公司能够预测员工的离职,模型职业轨迹,通过简历进行筛选并执行其他人力资源任务。促进这些任务的关键要求是需要与职业相关的数据集。但是,大多数研究都使用专有数据集或不公开其数据集,从而阻碍了该领域的发展。为了解决此问题,我们介绍了工业和专业职业数据集(iPod),该数据集(iPod)包括192K的作业标题,属于56K LinkedIn用户。除了公开提供iPod之外,我们还:(i)手动注释每个职位及其相关的资历,工作领域和位置; (ii)为职位提供嵌入并讨论各种用例。该数据集可在https://github.com/junhua/ipod上公开获取。
There has been growing interest in utilizing occupational data mining and analysis. In today's job market, occupational data mining and analysis is growing in importance as it enables companies to predict employee turnover, model career trajectories, screen through resumes and perform other human resource tasks. A key requirement to facilitate these tasks is the need for an occupation-related dataset. However, most research use proprietary datasets or do not make their dataset publicly available, thus impeding development in this area. To solve this issue, we present the Industrial and Professional Occupation Dataset (IPOD), which comprises 192k job titles belonging to 56k LinkedIn users. In addition to making IPOD publicly available, we also: (i) manually annotate each job title with its associated level of seniority, domain of work and location; and (ii) provide embedding for job titles and discuss various use cases. This dataset is publicly available at https://github.com/junhua/ipod.