8 April 2018

Data for Research — Lessons from the Cambridge Analytica Debacle

Source Link

Data from companies such as Facebook is an important resource to conduct research. But the law must ensure that it is processed in a de-identified form and not used for commercial gain. One of the most curious aspects of the recent Cambridge Analytica debacle was that Facebook’s databases were not hacked. The journey of the information that wound up in the hands of Cambridge Analytica started on innocent terms. An academic named Aleksandr Kogan approached Facebook with a request to access information to conduct research. Facebook granted permission to Kogan and the fallout from the subsequent actions of Kogan and Cambridge Analytica have been charted in much detail over the past week.

This case raises some questions about how a law on data protection must approach scenarios involving the use of data for research. The value of data for generating insights about human behaviour is not in doubt. That a giant corporation such as Facebook is a rich repository of data means that it can play an important role in fostering research by allowing researchers access to its databases. However, it is necessary that this access does not come at the cost of harm to the individuals who are parting with their data.

These questions have been part of the White Paper created by the Expert Committee headed by Justice Srikrishna on what a data protection law in India must look like. One of the biggest chapters in this White Paper is on the legitimate exemptions from the law, which includes an exemption for research and statistical purposes. The White Paper sought comments on the extent of the exemption that should be made available for such purposes.

The best approach for such cases would be to permit a limited exemption to use data for research so long as it complies to two conditions: 
The information should be processed in a de-identified form. 

This entails that the information be stripped of characteristics that help identify an individual. The easiest way of doing this is to process data in the aggregate. A necessary corollary of this is that it should be impossible to use the de-identified information to target individuals. So, while a particular demographic might behave in a certain way, it should be impossible from the research to ascribe this behaviour to a particular individual who might fall into that category. To ensure that de-identification is taken seriously, any attempt to re-identify such datasets must attract penalties under the law.

2. The information should not be used for commercial gain.

This is a straightforward requirement. The presence of a commercial motive might influence the way the research is conducted. To ensure that the research is ethical and academic integrity is not compromised, this requirement must also be strictly enforced.

Neither of these conditions were adhered to in the Cambridge Analytica case. The information gathered was used for a purpose other than mere research, and was used to target individuals during elections. This should serve as a wake-up call. India should adopt a robust data protection law that is empowered to act decisively in such scenarios, should they come to light in this country.

No comments: