[UPDATE: The final paper has been published in Ethics and Information Technology]

Next week I will be attending the 8th International Conference of Computer Ethics: Philosophical Enquiry in Corfu, Greece, where I will be presenting an early draft of a paper based on my critique of the “Taste, Ties, and Time” Facebook data release.

Recall that last fall, a group of researchers affiliated with the Berkman Center for Internet & Society at Harvard University released a dataset of Facebook profile information from an entire cohort (the class of 2009) of college students from “an anonymous, northeastern American university.” While the researchers took good faith steps to preserve the anonymity of the source of the data (and, presumably, the privacy of the subjects), I quickly narrowed it down to 7 possible universities, and then with only a little more effort, identified the source (with some confidence) as Harvard College. All this without ever even downloading or looking at the actual data.

The researchers have since pulled the data out of circulation, and plan to make it available again this month, presumably with some of the anonymity and privacy concerns addressed.

The draft paper I am presenting, “But the Data is Already Public”: On the Ethics of Research in Facebook, retells the circumstances around the T3 project and my partial re-identification of the dataset. It also describes some of the good faith efforts made by the T3 researchers to try to ensure the anonymity of the data, but exposes the limitations and errors in their procedures. Finally, it highlights the broader challenges for engaging in research on/in social networking sites that this case brings to light. These include:

  • the nature of consent in online research
  • identifying and respecting expectations of privacy on social network sites
  • developing sufficient strategies for data anonymization prior to the public release of potentially personally-identifiable data
  • measuring the relative expertise of institutional review boards when confronted with innovative research projects based on data gleaned from social media

Future versions of the paper will attempt to provide some guidelines in this regard. In the meantime, I welcome any comments on this draft. E-mail me if you would like to receive a copy.

The PDF of my CEPE presentation is here.