Last fall, the Tech Institute and Georgetown Professor Paul Ohm gathered experts to discuss the ethical re-use of data for machine learning purposes. The workshop addressed a specific - and challenging - question: at a time when machine learning based on vast data sets holds great promise for informing social science analysis, furthering medical research, and other important applications for social good, how should such uses be squared with concerns about privacy and user consent?
The workshop brought together leading experts from academia, computer science, government agencies, public interest groups and private companies for a half-day discussion.
The workshop highlighted the ways in which big data and machine learning appear to be challenging traditional legal, ethical, and attitudinal approaches to limiting the reuse of data. Its primary question was when, if ever, it may be appropriate to share or reuse data that was initially gathered for a different purpose.
The workshop began with an overview of machine learning systems and the Fair Information Practices (FIPs), which for decades have formed the backbone of data privacy regulations around the world. Participants debated whether the FIPs remain a practicable framework for analyzing privacy in the context of big data and machine learning operations on certain data sets. The workshop then focused on distilling points of consensus and contention on the future of data privacy in an era of big data and machine learning.
Ultimately, the statements of participants in the workshop demonstrate that the questions raised by data reuse in a machine learning world are important to broad segments of society, but also difficult to resolve given current legal and institutional frameworks, making this area ripe for further analysis and research.
This roundtable is the first in a series of workshops the Institute, Georgetown Professor Paul Ohm and other faculty will be convening on questions of data governance.