论文标题

我应该披露我的数据集吗?可重复性和个人数据权利之间的警告

Should I disclose my dataset? Caveats between reproducibility and individual data rights

论文作者

Benatti, Raysa M., Villarroel, Camila M. L., Avila, Sandra, Colombini, Esther L., Severi, Fabiana C.

论文摘要

自然语言处理技术帮助领域专家解决了法律问题。法院文件的数字可用性增加了研究人员的可能性,研究人员可以将其作为构建数据集的来源访问 - 其披露与计算研究中良好的可重复性实践保持一致。大型和数字化的法院系统,例如巴西元素,很容易在这种意义上探索。但是,个人数据保护法律对数据暴露和州原则施加了限制,以了解哪些研究人员应注意。必须在侵犯人权的情况下(例如性别歧视)采取特殊谨慎,我们将其阐述为感兴趣的一个例子。我们对该问题提出了法律和道德考虑,以及处理此类数据并决定是否披露它的研究人员的指南。

Natural language processing techniques have helped domain experts solve legal problems. Digital availability of court documents increases possibilities for researchers, who can access them as a source for building datasets -- whose disclosure is aligned with good reproducibility practices in computational research. Large and digitized court systems, such as the Brazilian one, are prone to be explored in that sense. However, personal data protection laws impose restrictions on data exposure and state principles about which researchers should be mindful. Special caution must be taken in cases with human rights violations, such as gender discrimination, over which we elaborate as an example of interest. We present legal and ethical considerations on the issue, as well as guidelines for researchers dealing with this kind of data and deciding whether to disclose it.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源